Avoiding state in Streams and Lambdas

September 17, 2017

Why would I want to avoid state?

We all know and love imperative programming, right?
It is simple to grasp, we are familiar with it and we are used to debugging it.
So why would you want to avoid state and imperative programming?
One day you get a very important job: the company is receiving some data and it is important for business to find a very specific piece of information in this data.
To keep things simple, let’s imagine that the data is in a list of strings and you want to find the longest string.

Since Java 8 we have Lambdas and Streams, you might come up with something like this:

public static void main(String[] args) {
   List<String> elements = new ArrayList<>(Arrays.asList("First", "Second", "Third", "SuperLongFifth", "Sixth"));
    
   String longest = elements.stream().sorted(
       (x, y) -> y.length() - x.length()
   ).findFirst().get();

   System.out.println(longest);
}

The problem with this is that it performs a sorting which can consume lots of resources and time if there are lots of elements. To find the longest element the whole list is sorted.
Alright, then let’s write it as we would do it before Java 8, with nice for loops we got used to:

public static void main(String[] args) {
   List<String> elements = new ArrayList<>(Arrays.asList("First", "Second", "Third", "SuperLongFifth", "Sixth"));
    
   String longest = "";

   for(String element : elements){
       if(element.length() > longest.length()){
           longest = element;
       }
   }

   System.out.println(longest);
}

Now let’s stop a moment and look at the code. Just by looking at it, it is not simple to tell what is it doing. We loop through the elements and then compare the element we are looking at to the  longest element we found so far. Not the most complicated code ever, but just because it is familiar, it does not mean it is easily readable.
But that is the least of our worries, because the problem is that this code is inherently serial. If we ever need to make it parallel, then we have to rewrite the whole code, since we are changing a state with the help of the longest variable and it is because of this mutable state, that the code is not thread safe.

How can we avoid maintaining the mutable state while iterating through the collection?

That’s a very good question, actually it is so good, that I came across of it at both sides of various interviews.
The answer is: using recursion. So how can we rewrite this code to use recursion?
Each step of the recursion will do this:

  1. It gets as a parameter the longest String and the elements we have not processed yet
  2. It will compare the first element of the list with the longest String: if the first element of the list is longer, then that is the new longest String
  3. Check if the list does become empty after removing the first element of the list. If it does, then return the longest String, if it is not empty, then call itself with the longest String and the list without the first element
  4. The recursion starts with an empty list as the longest already found String and the whole list of elements.

So it would look like this:

public class StreamsAndLambdaPractice {
    public static void main(String[] args) {
        List<String> elements = new ArrayList<>(Arrays.asList("First", "Second", "Third", "SuperLongFifth", "Sixth"));

        System.out.println(findLongestString("", elements));
    }

    private static String findLongestString(String longest, List<String> elements){
        //If the list is empty then we are done and we can return the longest element
        if(elements.size() == 0)
            return longest;

        //Let's see if the next element in the list is longer or not
        String next = elements.get(0);
        if(next.length() > longest.length()){
            longest = next;
        }

        //Remove the first element
        elements.remove(0);

        //Call the method again
        return findLongestString(longest, elements);
    }
}

I know, I know, you feel that it is less readable and understandable than the for loop. Let’s pause a moment here and contemplate on a thought: is it less understandable because it is more complex than the for loop or because we are more accustomed to writing loops? But yeah, I agree, this is not too readable either.
However, there are bigger problems with this recursive method: it does not scale. If there are too many elements in the list, you will either run out of memory (OutOfMemoryException) or you will get a stack overflow (StackOverflowError) due to the excessive use of stack frames.

Streams and Lambdas to the rescue

The recursive solution seemed promising, so let’s see how can we get rid of the stack overflow and out of memory issues.
The solution is the reduce from the Filter-Map-Reduce pattern. A reduce is a method which looks like this: Optional<T> reduce(BinaryOperator<T> accumulator).
It returns an Optional result of a type to avoid nulls and takes a BinaryOperator<T> of the same type called accumulator.  But what is a BinaryOperator?
A BinaryOperator<T> is a functional interface which has this functional method: T apply(T x, T y).  Takes two parameters of the same type and returns a result with the same type as the parameters. This apply will take a partial result as the first parameter (x), the next element as the second parameter (y). As you will see, it will be very similar to the recursive solution:

public static void main(String[] args) {
    List<String> elements = new ArrayList<>(Arrays.asList("First", "Second", "Third", "SuperLongFifth", "Sixth"));

    String longestElementViaReduction = elements.stream().reduce((x, y) -> {
        if(x.length() > y.length()) {
            return x;
        } else {
            return y;
        }
    }).get();

    System.out.println(longestElementViaReduction);
}

Let’s rewrite the statement in the reduce into a proper Lambda:

public static void main(String[] args) {
    List<String> elements = new ArrayList<>(Arrays.asList("First", "Second", "Third", "SuperLongFifth", "Sixth"));

    String longestElementViaReduction = elements.stream().reduce(
        (x, y) -> ((x.length() > y.length()) ? x : y)
    ).get();

    System.out.println(longestElementViaReduction);
}

What happens here is that x is maintaining the state for us as a partial result (the longest String we have found so far).
Alright, this looks good, there is less boilerplate code, the one single line inside the reduce is  all what we need to understand to see what this code does. But that is not all: since there are no side-effects or mutable states, hence it is very easy to make this code parallel: just change stream() into parallelStream() and the underlying fork/join framework will take care of all the nitty-gritty by splitting the work for multiple threads. Beware! It is not always faster to use parallel streams, but that is a topic for an another day.

Okay, I hear what you are saying, you are not convinced that this code is that much readable, since we had to use reduce which at first can be seen as cryptic. Luckily there is a max method on Streams which looks like this: Optional<T> max(Comparator<? super T> comparator). It returns an optional result of a type and takes a Comparator of the return type.
What we need is the Comparator.comparingInt method: it takes a function as a parameter which will be used to make the comparison:

public static void main(String[] args) {
    List<String> elements = new ArrayList<>(Arrays.asList("First", "Second", "Third", "SuperLongFifth", "Sixth"));

    String longestElementViaMax = elements.stream().max(
        Comparator.comparingInt(String::length)
    ).get();

    System.out.println(longestElementViaMax);
}

This code retains all the good stuff from the previous code (less boilerplate code, no side-effects, no mutable states, easy to make it parallel) while being quite readable: we are looking for the “max” (eg.: longest) element and we are comparing the elements with the use of String.length() to find the longest element.

Conclusion

Whenever you are thinking about writing a for loop or a forEach on a stream, stop. Take a deep breath and think about if you really need that forEach.
Do you really need to iterate through the elements to achieve your aim? Or, are you achieving your aim with the help of a loop? If your answer to the first question is a “no” or “maybe” and a “yes” to the second question, then maybe a reduce, min, max, filter or map is what you need.

András Döbröntey

About the Author

András Döbröntey

Leave a Comment:

Bitnami