Streaming CSV files with the Stream API in Java 8

Today I will show you how to utilize the Java 8 Stream API to parse the content of a CSV file of persons.

This is a follow up from my blog post yesterday, where I produced a comma separated value string with help of the Stream API.

Why using the Stream API to parse our CSV?

The stream API is a handy abstraction for working with aggregated data. This becomes particular handy when we need to perform multiple actions, such as transforming the content, apply some filters and perhaps group them by a property. With the Stream API we are able to register a lot of actions we want to perform on each row in the CSV file, and doing it with a descent level of abstraction. We want the framework to handle the low level stuff, such as reading and looping the data, but still be in control of what we want to achieve.

The Stram API is a perfect fit for the task I want to solve today. I have a CSV file with a lot of persons, presented in the example below (simplified). The first task is to read all the “lines” of persons and make them in to a list of persons, List.

Example CSV file

Name, Age, City, Country
Ivar Østhus, 28, Oslo, Norway
Petter Dass, 19, Hålogaland, Norway
Ola Nordmann, 61, Sandnes, Norway
Viswanathan Anand, 43, Mayiladuthurai, India
Magnus Carlsen, 22, Tønsberg, Norway
…(about 1 million rows)

This list can be tremendously long and I want to fetch the 50 first adults (age > 17). Luckily the BufferedReader in Java 8 has been upgraded to provide me with the Stream abstraction. All I need to to is to call the .lines() method on BufferedReader. (The Stream abstraction is where Java 8 as store all the functional sweetness coming in Java 8, such as map, filter, max, min, sum, etc).

Solution with Stream API

InputStream is = new FileInputStream(new File("persons.csv"));
BufferedReader br = new BufferedReader(new InputStreamReader(is));

List<Person> persons = br.lines()
    .substream(1)
    .map(mapToPerson)
    .filter(person -> person.getAge() > 17)
    .limit(50)
    .collect(toList());

In the example we see that we skip the first line (this is the header line in our CSV file), using the substream(1) function.

Next we map the person from a CSV line to a Person object. We use a predefined lambda function for this:

//A bit hackish
public static Function<String, Person> mapToPerson = (line) -> {
  String[] p = line.split(", ");
  return new Person(p[0], Integer.parseInt(p[1]), p[2], p[3]);
};

Then we just call the limit function, telling the Stream API that we just want 50 first persons matching our criteria (must be adult).

Another cool thing we can do fairly easy with the Stream abstraction is to compute the average age of all the persons in the list:

double avergaeAge = br.lines()
  .substream(1)
  .map(mapToPerson)
  .collect(averagingInt(Person::getAge));

Or find the oldest person in the list:

Optional<Person> oldetsPerson = br.lines()
  .substream(1)
  .map(mapToPerson)
  .max(byAge);

//Lambda expression:
static Comparator<Person> byAge = 
  (p1, p2) -> p1.getAge() - p2.getAge()

(her we get an optional of a person in good functional manner).

Summary

We could of course use all of the other cool Collectors, such as groupBy, counting, ready to use with the Stream API. I will probably blog more about the Stream API soon.

Thanks for ready my blog. Some other post about Java 8 and lambdas I have written recently:

Advertisements

What are lambdas in Java 8?

In this blog post I will briefly introduce lambdas which will be included as a new language feature in Java 8. I recently wrote a short introduction to functional programming support coming in Java 8. In this post I want to focus on lambda expressions, what they actually are and why they are awesome.

The motivation behind lambda expression is to provide super nice and simple syntax for passing functionality as arguments to another method, such as what to to when someone clicks a button. Pre JDK8 we used anonymous inner classes to do that, which typically implemented a functional interfaces (more details below). The problem we faced with that approach is that the syntax was verbose and unclear. It was really hard to write and read. Lambda expressions let you express instances of single-method classes more compactly [1]. We can think of lambda expressions as a way to define anonymous methods.

We can think of lambda expressions as a way to define anonymous methods.

Take lambdas for a spin

Let’s start simple. We have a List of numbers and we want to print all of the numbers using the new super awesome forEach-metod which accepts a consumer as argument. Without lambdas we can achieve this by implementing a Consumer:

List<Integer> numbers = Arrays.asList(1,2,3,4,5,6);

numbers.forEach(new Consumer<Integer>() {
  @Override
  public void accept(Integer value) {
    System.out.println(value);
  }
});

Wow that anonymous class looks really ugly and verbose. I actually prefer the regular enhanced-for loop over this ugly thing!

Lambda to the rescue

Thankfully lambdas comes to the rescue and allows us to just define the consumer function we want to be executed for each method:

List<Integer> numbers = Arrays.asList(1,2,3,4,5,6);
numbers.forEach((Integer value) -> System.out.println(value));

This is way better!! But There still is some noise in my eyes. Why do I have to tell the compiler the type of value? Can value possibly be anything other than Integer?

The answer is: NO it must be an Integer And it turns out that the Java 8 compiler can help us out by understanding the type for us, we don’t have to, hurray! This concept is known as type inference. Lets have a look:

List<Integer> numbers = Arrays.asList(1,2,3,4,5,6);
numbers.forEach(value -> System.out.println(value));

Wow, this starts looking like something readable. We even got rid of the parenthesis for the value parameter.

It’s important to notice that the forEach method still accept a Consumer as input and that it is the compiler that takes the provided lambda expression and converts it into a valid consumer.

Method Reference

Even though we ended up with a simple and easy to read lambda expression there is still something bothering me. We have created a function which takes the input argument and just calls a new function with the same argument as input.

Can’t we just use the println-function instead? The answer is method reference and here is an example:

List<Integer> numbers = Arrays.asList(1,2,3,4,5,6);
numbers.forEach(System.out::println);

We see that we use the special ‘::’-notation which allows us to borrow methods elsewhere. The result in this example is that the forEach method will call the println-method from System.out for each element in the list.

(Side note: It is also possible to refer to the constructor method with new: User::new).

Multiple blocks

Can we execute multiple lines of code in a lambda expression? Yes of course, just add some curly-brackets:

List<Integer> numbers = Arrays.asList(1,2,3,4,5,6);
numbers.forEach(value -> {
  String out = "Hi there value is: " + value;
  System.out.println(out);
});

Lexical Scoping & effectively final

Lambda expression closes over the scope of its definition, lexical scoping. From within a lambda expression we can only access local variables that are final or effectively final in the enclosing scope. Effectively final means that Java 8 relaxed the requirement to use the final keyword, but the variable can still not change if we want to access it inside a lambda expression. If the compile detects that the variable is mutated, inside or outside of the lambda-expression, it will complain.

List<Integer> numbers = Arrays.asList(1,2,3,4,5,6);
int someVal = 1;
numbers.forEach(value -> System.out.println(value+someVal));
someVal = 2;

The compile will complain in the above example because someVal is not effectively final.

More examples

As a last paragraph I will include a few legal lambda examples:

(int x, int y) -> { return x + y; }

(x, y) -> x + y

x -> x + x

() -> x

value -> { System.out.println(value); }

Functional interfaces

We can use lambdas with methods which takes a functional interface as argument. This section briefly introduce what a functional interface is.
The only requirement for a functinal interface is that it have one abstract unimplemented methods. It can have 0 or more default methods. In the example below I have showed a snippet from Predicate interface part of JDK8. The “FunctionalInterface” annotation is optional, but when present it will make sure that the interface have exactly one unimplemented method.

@FunctionalInterface
public interface Predicate<T> {
  boolean test(T t);
  default Predicate<T> and(Predicate<? super T> other) {
    Objects.requireNonNull(other);
    return (t) -> test(t) && other.test(t);
  }
  //...
}

The predicate is used to check whether an input argument satisfies our requirements. It has one abstract method “test” which should be implemented to verify the requirement. This functional interface also comes with default methods and, or, and negate. The first two are used to contact multiple predicates and the latter is used to invert a predicate. This allows us to reuse and build on top of existing predicates.

Other common functional interfaces found in the jdk8 java.util.functional package includes:

  1. Counsumer<T> – takes an input and performs an operation on it. Will cause side effects!
  2. Supplier<T> – a kind of factory. Will return a new instance or a existing instance.
  3. Predicate<T> – Checks if argument satisfies our requirements
  4. Function<T, R> – Used to transform an argument from type T to type R.
  5. BinaryOperator<T> – two T’s as argument, return one T as output

Summary

Lambdas are awesome and the corner stone in the introduction of functional programming in Java 8. It’s a clever way to introduce functional programming in Java, making it super simple to write them. Letting lambdas be defined via functional interfaces (already heavily used, e.g. eventListener) allows existing code to be forward compatible with lambdas. Clever!

You might think that lambdas is just a pretty syntax for creating anonymous inner classes under the hood. Then lambda capture just becomes constructor invocation. This is (thankfully) not the case. It would lead to performance issues (one class per lambda expression). Instead the language team uses the fifth bytecode method invocation mode introduces in Java 7, called invokedynamic. I want to do a special post on lambdas under the hood in a later blog post.