#devoxx 09: map&reduce and closures

No Comments

A hot topic here at the Devoxx were the upcoming Java editions with their features and changes in the language syntax. While it is nice that you will be able to switch() on Strings, have a modularized platform and other cool stuff, one thing is a bit odd: the hype Closures get. Java 7 will be delayed mainly by this. But what is the hype?

Closures are tricky to explain. I am not going to try it. But it is basically a piece of code that gets its own life. Closures can reference state that is no longer existing outside their scope.

Closures are neat. no doubt. I like programming with them in JavaScript. And I was pretty much attracted by Clojure when Howard Lewis Ship demoed it yesterday in his session. When talking about languages like Clojure it is crystal clear that this is a new language; it comes with its very own concepts you need to understand before you can develop. Knowing how to develop in OO Java does not help for developing in Clojure, even though it has interop with Java. I do wonder where Java is heading with adding closures. The average Java developer does not understand it, or even when he does, has no need for it. You need a different mental model to successfully implement this.

This new model is map&reduce. Well, ok, map and reduce are not the mental model behind Clojure, but understanding it allows developing in Java and using closures efficiently. Map&reduce was almost in every session, and it seems to be the answer to the scaling question. It simplifies a lot, because it only works when you simplify. To quote Gregor Hohpe: “if we would have rebuilt a relational database, just to make it faster, we would have failed”. This is so true. The inherent complexity of an relational model / database does not get any faster. If it would be possible, Oracle certainly would have done this already.

So we have to simplify, but how? Treat data as key value pairs, like in the good old Cobol times. Dump relations, what makes them important? They just cause troubles.

When you have made it to this with you mental transition you are ready to go with any language. With or without closures. you can do map and reduce and paralleliz as you like. The link between those is that you accept to handle one item at a time, you don’t have global state or side effects, and you invoke your closures as workers on data which return results.

Gregor showed a few patterns they apply at Google. I especially liked Sawzall, which seems not to be open sourced by Google yet. It is a nice descriptive and concise calculation instruction per data item. Its actually pretty much a single closure:

errorcount table sum of int
if contains input 'error'
  emit errorcount <- 1

It is basically a call to think outside the box. We cannot solve problems without, because we cannot improve existing solutions anymore. They are highly optimized. We need something new to solve our challenges.


Your email address will not be published. Required fields are marked *