MapReduce testing with MRUnit

1.6.2014 | 5 minutes of reading time

In one of the previous posts on our blog , my colleague gave us a nice example how to test a map/reduce job. A starting point was the implementation of it which was done using Apache Pig. I would like to extend his example in this post by adding a little twist to it. Map/reduce job I am going to test will be the same he used but implemented in Java.
Multi-threaded environment can be a hostile place to dwell in and debugging and testing it is not easy. With map/reduce things get even more complex. These jobs run in distributed fashion, across many JVMs in a cluster of machines. That is why it is important to use all the power of unit testing and run them as isolated as possible.
My colleague used PigUnit for testing his pig script. I am going to use MRUnit – a Java library written to help with unit testing map/reduce jobs.

Logic of the example is the same as in the mentioned post#link. There are two input paths. One containing user information: user id, first name, last name, country, city and company. Other one holds user’s awesomeness rating in a form of a pair: user id, rating value.

1# user information
21,Ozren,Gulan,Serbia,Novi Sad,codecentric
32,Petar,Petrovic,Serbia,Belgrade,some.company
43,John,Smith,England,London,brits.co
54,Linda,Jefferson,USA,New York,ae.com
65,Oscar,Hugo,Sweden,Stockholm,swe.co
7123,Random,Random,Random,Random,Random
8 
9# rating information
101,1000
112,15
123,200
134,11
145,5

*Disclaimer: Test data is highly reliable and taken from real life, so if it turns out that Ozren has the highest rating, he tweaked it :).

Our MR job reads the inputs line by line and joins the information about users and their awesomeness rating. It filters out all users with rating less than 150 leaving only awesome people in the results.
I decided not to show full Java code in the post because it is not important. It is to enough know what goes in and what we expect as a result of the job. Those interested in implementation details can find it here . These are just signatures of mapper and reducer classes – they determine types of input and output data:

1public class AwesomenessRatingMapper
2    extends Mapper<LongWritable, Text, LongWritable, AwesomenessRatingWritable> {
3    // ...
4}
5 
6public class AwesomenessRatingReducer
7    extends Reducer<LongWritable, AwesomenessRatingWritable, LongWritable, Text> {
8    // ...
9}

There are three main MRUnit classes that drive our tests: MapDriver, ReduceDriver and MapReduceDriver. They are generic classes whose type paremeters depend on input and output types of mapper, reducer and whole map/reduce job, respectively. This is how we instantiate them:

1AwesomenessRatingMapper mapper = new AwesomenessRatingMapper();
2MapDriver<LongWritable, Text, LongWritable, AwesomenessRatingWritable> mapDriver = MapDriver.newMapDriver(mapper);
3 
4AwesomenessRatingReducer reducer = new AwesomenessRatingReducer();
5ReduceDriver<LongWritable, AwesomenessRatingWritable, LongWritable, Text> reduceDriver = ReduceDriver.newReduceDriver(reducer);
6 
7MapReduceDriver<LongWritable, Text, LongWritable, AwesomenessRatingWritable, LongWritable, Text> mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);

MRUnit provides us tools to write tests in different manners. First approach is more traditional one – we specify the input, run the job (or a part of it) and check if the output looks like we expect. In other words, we do the assertions by hand.

1@Test
2public void testMapperWithManualAssertions() throws Exception {
3    mapDriver.withInput(new LongWritable(0L), TestDataProvider.USER_INFO);
4    mapDriver.withInput(new LongWritable(1L), TestDataProvider.RATING_INFO);
5 
6    Pair<LongWritable, AwesomenessRatingWritable> userInfoTuple = new Pair<LongWritable, AwesomenessRatingWritable>(
7                    TestDataProvider.USER_ID, TestDataProvider.USER_INFO_VALUE);
8    Pair<LongWritable, AwesomenessRatingWritable> ratingInfoTuple = new Pair<LongWritable, AwesomenessRatingWritable>(
9                    TestDataProvider.USER_ID, TestDataProvider.RATING_INFO_VALUE);
10 
11    List<Pair<LongWritable, AwesomenessRatingWritable>> result = mapDriver.run();
12 
13    Assertions.assertThat(result).isNotNull().hasSize(2).contains(userInfoTuple, ratingInfoTuple);
14}
15 
16// ...
17 
18@Test
19public void testReducerWithManualAssertions() throws Exception {
20    ImmutableList<AwesomenessRatingWritable> values = ImmutableList.of(TestDataProvider.USER_INFO_VALUE,
21                    TestDataProvider.RATING_INFO_VALUE);
22    ImmutableList<AwesomenessRatingWritable> valuesFilteredOut = ImmutableList.of(
23                    TestDataProvider.USER_INFO_VALUE_FILTERED_OUT, TestDataProvider.RATING_INFO_VALUE_FILTERED_OUT);
24 
25    reduceDriver.withInput(TestDataProvider.USER_ID, values);
26    reduceDriver.withInput(TestDataProvider.USER_ID_FILTERED_OUT, valuesFilteredOut);
27 
28    Pair<LongWritable, Text> expectedTupple = new Pair<LongWritable, Text>(TestDataProvider.USER_ID,
29                    TestDataProvider.RESULT_TUPPLE_TEXT);
30 
31    List<Pair<LongWritable, Text>> result = reduceDriver.run();
32 
33    Assertions.assertThat(result).isNotNull().hasSize(1).containsExactly(expectedTupple);
34}
35 
36// ...
37 
38@Test
39public void testMapReduceWithManualAssertions() throws Exception {
40    mapReduceDriver.withInput(new LongWritable(0L), TestDataProvider.USER_INFO);
41    mapReduceDriver.withInput(new LongWritable(1L), TestDataProvider.RATING_INFO);
42    mapReduceDriver.withInput(new LongWritable(3L), TestDataProvider.USER_INFO_FILTERED_OUT);
43    mapReduceDriver.withInput(new LongWritable(4L), TestDataProvider.RATING_INFO_FILTERED_OUT);
44 
45    Pair<LongWritable, Text> expectedTupple = new Pair<LongWritable, Text>(TestDataProvider.USER_ID,
46                    TestDataProvider.RESULT_TUPPLE_TEXT);
47 
48    List<Pair<LongWritable, Text>> result = mapReduceDriver.run();
49 
50    Assertions.assertThat(result).isNotNull().hasSize(1).containsExactly(expectedTupple);
51}

Other approach is to specify input and specify the output, too. In this case, we do not have to do the assertions. Instead, we can let the framework do it.

1@Test
2public void testMapperWithAutoAssertions() throws Exception {
3    mapDriver.withInput(new LongWritable(0L), TestDataProvider.USER_INFO);
4    mapDriver.withInput(new LongWritable(1L), TestDataProvider.RATING_INFO);
5 
6    mapDriver.withOutput(TestDataProvider.USER_ID, TestDataProvider.USER_INFO_VALUE);
7    mapDriver.withOutput(TestDataProvider.USER_ID, TestDataProvider.RATING_INFO_VALUE);
8 
9    mapDriver.runTest();
10}
11 
12// ...
13 
14@Test
15public void testReducerWithAutoAssertions() throws Exception {
16    ImmutableList<AwesomenessRatingWritable> values = ImmutableList.of(TestDataProvider.USER_INFO_VALUE,
17                    TestDataProvider.RATING_INFO_VALUE);
18    ImmutableList<AwesomenessRatingWritable> valuesFilteredOut = ImmutableList.of(
19                    TestDataProvider.USER_INFO_VALUE_FILTERED_OUT, TestDataProvider.RATING_INFO_VALUE_FILTERED_OUT);
20 
21    reduceDriver.withInput(TestDataProvider.USER_ID, values);
22    reduceDriver.withInput(TestDataProvider.USER_ID_FILTERED_OUT, valuesFilteredOut);
23 
24    reduceDriver.withOutput(new Pair<LongWritable, Text>(TestDataProvider.USER_ID,
25                    TestDataProvider.RESULT_TUPPLE_TEXT));
26 
27    reduceDriver.runTest();
28}
29 
30// ...
31 
32@Test
33public void testMapReduceWithAutoAssertions() throws Exception {
34    mapReduceDriver.withInput(new LongWritable(0L), TestDataProvider.USER_INFO);
35    mapReduceDriver.withInput(new LongWritable(1L), TestDataProvider.RATING_INFO);
36    mapReduceDriver.withInput(new LongWritable(3L), TestDataProvider.USER_INFO_FILTERED_OUT);
37    mapReduceDriver.withInput(new LongWritable(4L), TestDataProvider.RATING_INFO_FILTERED_OUT);
38 
39    Pair<LongWritable, Text> expectedTupple = new Pair<LongWritable, Text>(TestDataProvider.USER_ID,
40                    TestDataProvider.RESULT_TUPPLE_TEXT);
41    mapReduceDriver.withOutput(expectedTupple);
42 
43    mapReduceDriver.runTest();
44}

The main difference is in calling driver’s method run() or runTest(). First one just runs the test without validating the results. Second also adds validation of the results to the execution flow.

There are some nice things in MRUnit I wanted to point out (some of them are shown in this post in more detail). For example…
Method List> MapDriver#run() returns a list of pairs which is useful for testing the situations when mapper produces key/value pairs for given input. This is what we have used in the approach when we were checking the results of the mapper run.

Then, both MapDriver and ReduceDriver have method getContext(). It returns Context for further mocking – online documentation has some short but clear examples how to do it.

Why not to mention counters? Counters are the easiest way to measure and track the number of operations that happen in Map/Reduce programs. There are some built in counters like “Spilled Records”, “Map output records”, “Reduce input records” or “Reduce shuffle bytes”… MRUnit supports inspecting those by using getCounters() method of each of the drivers.

Class TestDriver provides facility for setting mock configuration – TestDriver#getConfiguration()) will allow you to change only those parts of configuration you need to change.

Finally, MapReduceDriver is useful for testing the MR job in whole, checking if map and reduce parts are working combined together.

MRUnit is still young project, just a couple of years old, but it is already interesting and helpful. And, if I compare this approach to testing M/R jobs to the one [presented by a colleague of mine#link], I prefer MRUnit to PigUnit. MRUnit is not better – it is is made for testing “native”, Java M/R jobs and I like that implementation approach more. PigScript vs Java M/R is completely other topic .

Was this post helpful?

Likes

Blog author

Dusan Zamurovic

Do you still have questions? Just send me a message.

fromDusan Zamurovic

Map/Reduce with Hadoop and Pig

Big data. One of the buzz words of the software industry in the last decade. We all heard about it but I am not sure if we actually can comprehend it as we should and as it deserves. It reminds me of the Universe – mankind has knowledge that it is big...

Big Data
Java

25.10.2012 | 7 Minuten Lesezeit

Dusan Zamurovic

Android persistence accelerated – revisited

Finally, after quite a while, we found some free time to work on Android persistence library I wrote about in this blog post . Knowing we have very tight schedule, as always, we wanted to make sure library is ready to be used. So, we took a good look...

Android
Java
Mobile
Database

9.5.2012 | 3 Minuten Lesezeit

Dusan Zamurovic

Developing JavaScript client using, well, JavaScript

So, we are using JavaScript to develop a JavaScript client. What do you think about that? We are not using GWT, RichFaces or any other tech that could free us from writing JavaScript. We decided to get our hands dirty and to write JavaScript ourselves...

14.11.2011 | 6 Minuten Lesezeit

Dusan Zamurovic

Android persistence accelerated – small inhouse ORM

A person easily gets used to comfort and luxury. In every segment of life. Bigger apartment, better car, new phone, bigger kitchen sink… Those are all good things. But, a person easily forgets how it was before the progress happened. Nervousness in the...

Database

4.4.2011 | 5 Minuten Lesezeit

Dusan Zamurovic

Android, Maven and Hudson. Pardon me, Jenkins.

Android platform is based on Java but is somehow different. It compiles into Dalvik rather than into Java byte code and runs in emulator which is enough to make some of your standard Java tools fail and become unusable. There was one specific problem...

16.2.2011 | 6 Minuten Lesezeit

Dusan Zamurovic

On your mark, get set, present!

In my inner dialog about GWT I mentioned that we used Model-View-Presenter approach in our project – MVP plus event bus mechanism. It is quite interesting approach, could be labeled as overhead, but it is with no doubt useful. This time, I would like...

Java
Agile methods

27.11.2010 | 4 Minuten Lesezeit

Dusan Zamurovic

Inner dialog on GWT – benefits and drawbacks

Project I’m currently working on really interested and intrigued me. Main reason is GWT, technology I had chance to meet more than once, but never to get to know it very well. When I heard that it will be used, I was very enthusiastic about it, because...

Java
Agile methods
UX/UI

1.11.2010 | 5 Minuten Lesezeit

Dusan Zamurovic

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Send

MapReduce testing with MRUnit

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Map/Reduce with Hadoop and Pig

Android persistence accelerated – revisited

Developing JavaScript client using, well, JavaScript

Android persistence accelerated – small inhouse ORM

Android, Maven and Hudson. Pardon me, Jenkins.

On your mark, get set, present!

Inner dialog on GWT – benefits and drawbacks

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten

Contact

Send

MapReduce testing with MRUnit

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Map/Reduce with Hadoop and Pig

Android persistence accelerated – revisited

Developing JavaScript client using, well, JavaScript

Android persistence accelerated – small inhouse ORM

Android, Maven and Hudson. Pardon me, Jenkins.

On your mark, get set, present!

Inner dialog on GWT – benefits and drawbacks

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten