MapReduce testing with MRUnit

In one of the previous posts on our blog, my colleague gave us a nice example how to test a map/reduce job. A starting point was the implementation of it which was done using Apache Pig. I would like to extend his example in this post by adding a little twist to it. Map/reduce job I am going to test will be the same he used but implemented in Java.
Multi-threaded environment can be a hostile place to dwell in and debugging and testing it is not easy. With map/reduce things get even more complex. These jobs run in distributed fashion, across many JVMs in a cluster of machines. That is why it is important to use all the power of unit testing and run them as isolated as possible.
My colleague used PigUnit for testing his pig script. I am going to use MRUnit – a Java library written to help with unit testing map/reduce jobs.
(read more…)

Dusan Zamurovic

Map/Reduce with Hadoop and Pig

Big data. One of the buzz words of the software industry in the last decade. We all heard about it but I am not sure if we actually can comprehend it as we should and as it deserves. It reminds me of the Universe – mankind has knowledge that it is big, huge, vast, but no one can really understand the size of it. Same can be said for the amount of data being collected and processed every day somewhere in the clouds if IT. As Google’s CEO, Eric Schmidt, once said: “There were 5 exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days.”

Mankind is clearly capable of storing and persisting this hardly imaginable bulk of data, that’s for sure. What impresses me more is that we are able to process it and analyze it in reasonable time.

(read more…)

Dusan Zamurovic

Android persistence accelerated – revisited

Finally, after quite a while, we found some free time to work on Android persistence library I wrote about in this blog post. Knowing we have very tight schedule, as always, we wanted to make sure library is ready to be used. So, we took a good look at what we did before, rolled up our sleaves and got to work.
Main goal was to make library stable and useful. In order to achieve that, some of the functionalities were reimplemented, some new were added and some were removed. There is no sense in having some features that are not part of any complete logic representing only fractions of future functionality set. Since those are useless if looked upon separately, they could only confuse person who is using the library.

(read more…)

Dusan Zamurovic