The Java Stream interface

The Java Stream interface defines an "iterator" that includes logic of how we want to iterate through the elements in the stream. Another way of seeing things is that a Stream defines a "query" into the data in question.

In our introduction to streams, we saw, for example, that we could call limit() to specify that only a certain number of elemnets would be iterated through. We also saw an example of the filter() method combined with a lambda expression to determine specific items of data that we wanted to include or exclude from the stream.

In fact, the Stream interface defines a host of calls to specify particular properties that we would like the stream to have. The main ones are summarised in the table below.

Stream methodExamplePurpose
Stream<Integer> distinctNos =
Returns a stream that will only present each distinct item from the original stream once.
Stream<Integer> firstIDs =
Returns a stream that stops iteration after a the given number of elements.
Stream<Integer> nos =
Returns a stream that pulls objects from the original stream in sorted order.
Stream<Integer> sortedIDs =
Returns a stream that pulls successive objects out of the original steam in sorted order.
Stream<?> unordered =
Returns a stream that does not guarantee any particular ordering on iteration. The reason for using unordered() is that by specifying that ordering is not important, certain optimisations nay be possible.
Stream<String> middleNames =
Returns a stream that pulls successive objects out of the original stream, having skipped past the specified number of items.
Stream<String> skipInitials =
  .dropWhile(s -> s.length() < 2);
Returns a stream that returns items left after skipping any items that match the given condition.

Lazy execution of stream operations

It is important to note that the above methods define how the stream will be iterated when it is terminated. So calling sorted(), for example, does not actually cause the data to be sorted. Only when you call a terminating operation such as forEach() is the data in the stream actually iterated through, and at that moment, operations such as sorting occur, if they are necessary:

  .distinct()                    // <- Defines that we will filter
                                 //       on distinct strings

  .sorted()                      // <- Defines that we will sort
  .forEach(System.out::println)  // <- Actually filters, sorts and iterates

Stream state and optimisations

As we have mentioned, a key advantage of a Stream compared to a simple Iterator is that a stream encapsulates the information and logic that defines the iteration. In other words, a stream potentially "knows" whether its elements are sorted, distinct etc. Calling distinct() on a Stream can be a no-op if the stream originated from a Set, for example, since by definition, a set cannot contain more than one instance of any two equal objects.

This potential for optimisations means that the developer should avoid certain assumptions:

Combining distinct(), sorted(), limit() etc

The above stream "filtering" or "query" methods can be combined in potentially powerful ways, allowing you to query or search through the contents of a Java list or other collection in relatively few lines of code. Although some optimisations are possible because streams "know" about their state, it is important to stress that in general, stream filtering methods are executed independently of one another.

As an example of what we mean and why this is a potential limitation, imagine that we want to return the first 20 items of a list in sort order. We could achieve this as follows:

Now strictly speaking to achieve this, there is no need to sort the entire list of strings. If there are 1000 strings in the list, we only need to know that strings 21-1000 occur in some place or after the first 20 strings. But the specific ordering among those remaining 980 strings themselves is irrelevant. However, Stream.sorted() will force the entire list of 1000 strings to be sorted when they are iterated. In effect, the sort and limit operations are "unaware" of one another.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.