Streaming CSV files with the Stream API in Java 8

Today I will show you how to utilize the Java 8 Stream API to parse the content of a CSV file of persons.

This is a follow up from my blog post yesterday, where I produced a comma separated value string with help of the Stream API.

Why using the Stream API to parse our CSV?

The stream API is a handy abstraction for working with aggregated data. This becomes particular handy when we need to perform multiple actions, such as transforming the content, apply some filters and perhaps group them by a property. With the Stream API we are able to register a lot of actions we want to perform on each row in the CSV file, and doing it with a descent level of abstraction. We want the framework to handle the low level stuff, such as reading and looping the data, but still be in control of what we want to achieve.

The Stram API is a perfect fit for the task I want to solve today. I have a CSV file with a lot of persons, presented in the example below (simplified). The first task is to read all the “lines” of persons and make them in to a list of persons, List.

Example CSV file

Name, Age, City, Country
Ivar Østhus, 28, Oslo, Norway
Petter Dass, 19, Hålogaland, Norway
Ola Nordmann, 61, Sandnes, Norway
Viswanathan Anand, 43, Mayiladuthurai, India
Magnus Carlsen, 22, Tønsberg, Norway
…(about 1 million rows)

This list can be tremendously long and I want to fetch the 50 first adults (age > 17). Luckily the BufferedReader in Java 8 has been upgraded to provide me with the Stream abstraction. All I need to to is to call the .lines() method on BufferedReader. (The Stream abstraction is where Java 8 as store all the functional sweetness coming in Java 8, such as map, filter, max, min, sum, etc).

Solution with Stream API

InputStream is = new FileInputStream(new File("persons.csv"));
BufferedReader br = new BufferedReader(new InputStreamReader(is));

List<Person> persons = br.lines()
    .substream(1)
    .map(mapToPerson)
    .filter(person -> person.getAge() > 17)
    .limit(50)
    .collect(toList());

In the example we see that we skip the first line (this is the header line in our CSV file), using the substream(1) function.

Next we map the person from a CSV line to a Person object. We use a predefined lambda function for this:

//A bit hackish
public static Function<String, Person> mapToPerson = (line) -> {
  String[] p = line.split(", ");
  return new Person(p[0], Integer.parseInt(p[1]), p[2], p[3]);
};

Then we just call the limit function, telling the Stream API that we just want 50 first persons matching our criteria (must be adult).

Another cool thing we can do fairly easy with the Stream abstraction is to compute the average age of all the persons in the list:

double avergaeAge = br.lines()
  .substream(1)
  .map(mapToPerson)
  .collect(averagingInt(Person::getAge));

Or find the oldest person in the list:

Optional<Person> oldetsPerson = br.lines()
  .substream(1)
  .map(mapToPerson)
  .max(byAge);

//Lambda expression:
static Comparator<Person> byAge = 
  (p1, p2) -> p1.getAge() - p2.getAge()

(her we get an optional of a person in good functional manner).

Summary

We could of course use all of the other cool Collectors, such as groupBy, counting, ready to use with the Stream API. I will probably blog more about the Stream API soon.

Thanks for ready my blog. Some other post about Java 8 and lambdas I have written recently:

Getting functional in Java 8

In September I attended JavaOne 2013 in San Francisco. Oracle was showing off Java 8, scheduled for GA in Q1 2014. The feature comming in Java 8 which exited me the most was the functional part, introduced with Project Lambda.

All the other major platform, such as C#, has had this for years now and finally Java is growing up and will introduce functional programming in Java 8. In previous versions of Java we have been so used to imperative style programming that it is hard to even realize the alternatives. It has worked fine, but is very low level and a extremely verbose syntax compared to the alternatives. With the new functional features we are now able to express what we want to achieve more consciously and not worry so much about how to actually do it. Java 8 enables us to used old school rock solid OO design and combine it with functional patterns. Combined we will be able to achieve more with less, meaning fewer bugs and more value delivered. This is a big change for Java, even bigger than generics introduced a few years back.

In this post I will present a few simple examples on how you can utilize functions in Java 8.

Setup

As a basis for each example I will have a list of persons as showed in the snipped below.

List persons = new ArrayList();
persons.add(new Person("Knerten Lillebror", 12));
persons.add(new Person("Kari Normann", 29));
persons.add(new Person("Ole Hansen", 32));

Passing functions in Java 8

Say you want to print the name of each person in the list. How do we do that in Java today (pre Java 8)? EASY! We loop the list and for each item in the list we print the name. We even use the enhanced for loop. Pretty simple, right?

for(Person p : persons) {
  System.out.println(p.getName());
}

This is referred to imperative code syle. There is mainly two problems with this example:

  1. We have to introduce a temporary variable (p)
  2. We have to know HOW to iterate a list (the for ioop

Not only do we express WHAT we want to do, we also have to express HOW to do it, iterating all the elements and introducing a mutable element. In Java 8, we now have a forEach method on collections, which allows us to pass a function. The underlying framework will take care of how to loop each element. We will need to pass a Consumer, which performs an operation on each element:

persons.forEach(p -> System.out.println(p));

Remove elements

The collection also makes it super simple to remove elements from a collection. We just use a lambda expression, a predicate, to express which elements we want removed. How would you implement this pre Java 8?

persons.removeIf(p -> p.getAge() > 20);

* We could also use the syntax “(Person p) -> p.getAge() > 20)”, to specify type, this is optional, as it is automatically inferred by the compiler.

WARNING. I generally do not feel it is a good practice to use this function as it mutates the actual list. In my opinion it would be better if it returning a new list/view, without the elements matched by the predicate.

Method references

In Java 8 we will also be able to borrow functions from other classes using the “::” notation:

persons.forEach(System.out::println);

Function blocks

It’s even possible to pass a function block:

persons.forEach(p -> {
    System.out.print("hi there: ");
    System.out.println(p);
});

This is the first quick post with many more to come. The Stream API is especially interesting and I will cover a lot of it soon.