Streaming CSV files with the Stream API in Java 8

Today I will show you how to utilize the Java 8 Stream API to parse the content of a CSV file of persons.

This is a follow up from my blog post yesterday, where I produced a comma separated value string with help of the Stream API.

Why using the Stream API to parse our CSV?

The stream API is a handy abstraction for working with aggregated data. This becomes particular handy when we need to perform multiple actions, such as transforming the content, apply some filters and perhaps group them by a property. With the Stream API we are able to register a lot of actions we want to perform on each row in the CSV file, and doing it with a descent level of abstraction. We want the framework to handle the low level stuff, such as reading and looping the data, but still be in control of what we want to achieve.

The Stram API is a perfect fit for the task I want to solve today. I have a CSV file with a lot of persons, presented in the example below (simplified). The first task is to read all the “lines” of persons and make them in to a list of persons, List.

Example CSV file

Name, Age, City, Country
Ivar Østhus, 28, Oslo, Norway
Petter Dass, 19, Hålogaland, Norway
Ola Nordmann, 61, Sandnes, Norway
Viswanathan Anand, 43, Mayiladuthurai, India
Magnus Carlsen, 22, Tønsberg, Norway
…(about 1 million rows)

This list can be tremendously long and I want to fetch the 50 first adults (age > 17). Luckily the BufferedReader in Java 8 has been upgraded to provide me with the Stream abstraction. All I need to to is to call the .lines() method on BufferedReader. (The Stream abstraction is where Java 8 as store all the functional sweetness coming in Java 8, such as map, filter, max, min, sum, etc).

Solution with Stream API

InputStream is = new FileInputStream(new File("persons.csv"));
BufferedReader br = new BufferedReader(new InputStreamReader(is));

List<Person> persons = br.lines()
    .substream(1)
    .map(mapToPerson)
    .filter(person -> person.getAge() > 17)
    .limit(50)
    .collect(toList());

In the example we see that we skip the first line (this is the header line in our CSV file), using the substream(1) function.

Next we map the person from a CSV line to a Person object. We use a predefined lambda function for this:

//A bit hackish
public static Function<String, Person> mapToPerson = (line) -> {
  String[] p = line.split(", ");
  return new Person(p[0], Integer.parseInt(p[1]), p[2], p[3]);
};

Then we just call the limit function, telling the Stream API that we just want 50 first persons matching our criteria (must be adult).

Another cool thing we can do fairly easy with the Stream abstraction is to compute the average age of all the persons in the list:

double avergaeAge = br.lines()
  .substream(1)
  .map(mapToPerson)
  .collect(averagingInt(Person::getAge));

Or find the oldest person in the list:

Optional<Person> oldetsPerson = br.lines()
  .substream(1)
  .map(mapToPerson)
  .max(byAge);

//Lambda expression:
static Comparator<Person> byAge = 
  (p1, p2) -> p1.getAge() - p2.getAge()

(her we get an optional of a person in good functional manner).

Summary

We could of course use all of the other cool Collectors, such as groupBy, counting, ready to use with the Stream API. I will probably blog more about the Stream API soon.

Thanks for ready my blog. Some other post about Java 8 and lambdas I have written recently: