Lambdas and Streams

Item42: Prefer lambdas to anonymous classes

The most common usage of anonymous class:

Collections.sort(words, new Comparator<String>() {
  public int compare(String s1, String s2) {
  	return Integer.compare(s1.length(), s2.length());
  }
});

Now it can be replaced by the lambda expression:

Collections.sort(words, (s1, s2) -> Integer.compare(s1.length(), s2.length()));

Note that the types of the lambda (Comparator), of its parameters (s1 and s2, both String), and of its return value (int) are not present in the code. The compiler deduces these types from context, using a process known as type inference. So the rule is: Omit the types of all lambda parameters unless their presence makes your program clearer.

Or, the comparator code can be even shorter:

Collections.sort(words, comparingInt(String::length));

This is called comparator construction method. And given that the sort method has been added to List since Java 8, the code can be rewritten like this:

words.sort(comparingInt(String::length));

Lambdas can be used in any place where accepts functional interfaces. But lambdas lack names and documentation; if a computation isn't self-explanatory, or exceeds a few lines, don't put it in a lambda. Also, lambdas are limited to functional interfaces, and if we want an instance of abstract class, the anonymous class is the choice. Anonymous classes can also create instances of interfaces with multiple abstract methods. Finally, a lambda cannot obtain a reference to itself.

In the end, we should not serialize lambdas, and we should use an instance of a private static nested class instead.

Item43: Prefer method references to lambdas

Lambdas are more succinct than anonymous classes, while method references are even more succinct than lambdas.

The function of the code snippet is to associate the number 1 with the key if it is not in the map and to increment the associated value if the key is already present:

map.merge(key, 1, (count, incr) -> count + incr);  // this method was added in Java 8

Rewrite:

map.merge(key, 1, Integer::sum);

Also, it would be helpful when a lambda is too long or complex: extract the code from the lambda into a new method and replace the lambda with a reference to that method.

Many method references refer to static methods, but there are four kinds that do not.

  • bound instance method reference: the receiving object is specified in the method reference.
  • unbound instance method reference: the receiving object is specified when the function object is applied, via an additional parameter before the method's declared parameters.
  • constructor reference - for class and array: serve as factory objects.
Method Ref Type Example Lambda Equivalent
Static Integer::parseInt str -> Integer.parseInt(str)
Bound Instant.now()::isAfter Instant then = Instant.now(); t -> then.isAfter(t)
Unbound String::toLowerCase str -> str.toLowerCase()
Class Constructor TreeMap::new () -> new TreeMap
Array Constructor int[]::new len -> new int[len]

The simplest rule: Where method references are shorter and clearer, use them; where they aren't, stick with lambdas.

Item44: Favor the use of standard functional interfaces

After Java has lambdas, classes could provide a static factory or constructor that accepts a function object to achieve the same effect. The java.util.function package provides a large collection of standard functional interfaces for use. If one of the standard functional interfaces does the job, we should generally use it in preference to a purpose-built functional interface.

There are forty-three interfaces in java.util.Function. The basic interfaces operate on object reference types.

  • The Operator interfaces represent functions whose result and argument types are the same.
  • The Predicate interface represents a function that takes an argument and returns a boolean.
  • The Function interface represents a function whose argument and return types differ.
  • The Supplier interface represents a function that takes no arguments and returns (or "supplies") a value.
  • The consumer represents a function that takes an argument and returns nothing, essentially consuming its argument.
Interface Function Signature Example
UnaryOperator<T> T apply(T t) String::toLowerCase
BinaryOperator<T> T apply(T t1, T t2) BigInteger::add
Predicate<T> boolean test(T t) Collection::isEmpty
Function<T, R> R apply(T t) Arrays::asList
Supplier<T> T get() Instant::now
Consumer<T> void accept(T t) System.out::println

There are also three variants of each of the six basic interfaces to operate on the primitive types int, long, and double. Their names are derived from the basic interfaces by prefixing them with a primitive type.

There are nine additional variants of the Function interface, for use when the result type is primitive. If both the source and result types are primitive, prefix Function with SrcToResult. If the source is a primitive and the result is an object reference, prefix Function with <Src>ToObj.

There are two-argument versions of the three basic functional interfaces for which it makes sense to have them: BiPredicate<T, U>, BiFunction<T, U>, and BiConsumer<T, U>.

Finally, there is the BooleanSupplier interface, a variant of Supplier that returns boolean values.

Most of the standard functional interfaces exist only to provide support for primitive types. Don't be tempted to use basic functional interfaces with boxed primitives instead of primitive functional interfaces. And we should always annotate our functional interfaces with the @FunctionalInterface annotation.

Do not provide a method with multiple overloadings that take different functional interfaces in the same argument position if it could create a possible ambiguity in the client.

Item45: Use streams judiciously

A stream pipeline consists of a source stream followed by zero or more intermediate operations and one terminal operation. Each intermediate operation transforms the stream in some way, and the terminal operation performs a final computation on the stream resulting from the last intermediate operation.

Stream pipelines are evaluated lazily: evaluation doesn't start until the terminal operation is invoked, and data elements that aren't required in order to complete the terminal operation are never computed.

public class Anagrams {
  public static void main(String[] args) throws IOException {
    File dictionary = new File(args[0]);
    int minGroupSize = Integer.parseInt(args[1]);
    Map<String, Set<String>> groups = new HashMap<>();
    try (Scanner s = new Scanner(dictionary)) {
      while (s.hasNext()) {
        String word = s.next();
        groups.computeIfAbsent(alphabetize(word), (unused) -> new TreeSet<>()).add(word);
      }
    }
    for (Set<String> group : groups.values())
      if (group.size() >= minGroupSize)
      	System.out.println(group.size() + ": " + group);
  }
  private static String alphabetize(String s) {
    char[] a = s.toCharArray();
    Arrays.sort(a);
    return new String(a);
  }
}

Keep in mind: Overusing streams makes programs hard to read and maintain. And in the absence of explicit types, careful naming of lambda parameters is essential to the readability of stream pipelines. Also, we should refrain from using streams to process char values.

Basic rule: refactor existing code to use streams and use them in new code only where it makes sense to do so.

There are some things you can do from code blocks that you can't do from function objects:

  • From a code block, you can read or modify any local variable in scope; from a lambda, you can only read final or effectively final variables, and you can't modify any local variables.
  • From a code block, you can return from the enclosing method, break or continue an enclosing loop, or throw any checked exception that this method is declared to throw; from a lambda you can do none of these things.

One thing that is hard to do with streams is to access corresponding elements from multiple stages of a pipeline simultaneously: once you map a value to some other value, the original value is lost.

In the end, if you're not sure whether a task is better served by streams or iteration, try both and see which works better.

Item46: Prefer side-effect-free functions in streams

Streams isn't just an API, it's a paradigm based on functional programming.

The most important part of the streams paradigm is to structure your computation as a sequence of transformations where the result of each stage is as close as possible to a pure function of the result of the previous stage, which only depends on the input, instead of any mutable stat. In order to obtain the expressiveness, speed, and in some cases parallelizability that streams have to offer, you have to adopt the paradigm as well as the API.

Note: A forEach operation that does anything more than present the result of the computation performed by a stream is a "bad smell in code," as is a lambda that mutates state. The forEach operation should be used only to report the result of a stream computation, not to perform the computation.

The collectors for gathering the elements of a stream into a true Collection are straightforward. There are three such collectors: toList(), toSet(), and toCollection(collectionFactory).

List<String> topTen = freq.keySet().stream()
  .sorted(comparing(freq::get).reversed())
 	.limit(10)
 	.collect(toList());

It is customary and wise to statically import all members of Collectors because it makes stream pipelines more readable.

In addition to the toMap method, the Collectors API provides the groupingBy method, which returns collectors to produce maps that group elements into categories based on a classifier function.

words.collect(groupingBy(word -> alphabetize(word)))

Item47: Prefer Collection to Stream as a return type

Stream failed to extend Iterable, so it does not make iteration obsolete. The Collection interface is a subtype of Iterable and has a stream method, so it provides for both iteration and stream access. Collection or an appropriate subtype is generally the best return type for a public, sequencereturning method. But do not store a large sequence in memory just to return it as a collection.

The disadvantage of using Collection as a return type rather than Stream or Iterable: Collection has an int-returning size method, which limits the length of the returned sequence to Integer.MAX_VALUE, or 2^31 − 1.

Also note that the flatMap method is used to generate a single stream consisting of all the suffixes of all the prefixes. Finally, note that we generate the prefixes and suffixes by mapping a stream of consecutive int values returned by IntStream.range and IntStream.rangeClosed.

The adapter method has flaws: Not only does the Stream-to-Iterable adapter clutter up client code, but it slows down the loop by a factor of 2.3 on my machine.

Item48: Use caution when making streams parallel

The streams library has no idea how to parallelize this pipeline and the heuristics fail. Parallelizing a pipeline is unlikely to increase its performance if the source is from Stream.iterate, or the intermediate operation limit is used. Worse, the default parallelization strategy deals with the unpredictability of limit by assuming there's no harm in processing a few extra elements and discarding any unneeded results. So do not parallelize stream pipelines indiscriminately.

As a rule, performance gains from parallelism are best on streams over ArrayList, HashMap, HashSet, and ConcurrentHashMap instances; arrays; int ranges; and long ranges. Another important factor that all of these data structures have in common is that they provide good-to-excellent locality of reference when processed sequentially.

Not only can parallelizing a stream lead to poor performance, including liveness failures; it can lead to incorrect results and unpredictable behavior (safety failures). It's important to remember that parallelizing a stream is strictly a performance optimization.

Under the right circumstances, it is possible to achieve near-linear speedup in the number of processor cores simply by adding a parallel call to a stream pipeline.

// Prime-counting stream pipeline - parallel version
static long pi(long n) {
  return LongStream.rangeClosed(2, n)
    .parallel()
    .mapToObj(BigInteger::valueOf)
    .filter(i -> i.isProbablePrime(50))
    .count();
}

In summary, do not even attempt to parallelize a stream pipeline unless you have good reason to believe that it will preserve the correctness of the computation and increase its speed. The cost of inappropriately parallelizing a stream can be a program failure or performance disaster.