Writing a Unit Test - Hadoop

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce known outputs.However, since outputs are written to an Output Collector, rather than simply being returned from the method call, the Output Collector needs to be replaced with a mock so that its outputs can be verified. There are several Java mock object frameworks that can help build mocks; here we use Mockito, which is noted for its clean syntax, although any mock framework should work just as well.

All of the tests described here can be run from within an IDE.


The test for the mapper is shown in below:

The test is very simple: it passes a weather record as input to the mapper, then checks the output is the year and temperature reading. The input key and Reporter are both ignored by the mapper, so we can pass in anything, including null as we do here. To create a mock OutputCollector, we call Mockito’s mock() method (a static import), passing the class of the type we want to mock. Then we invoke the mapper’s map()method, which executes the code being tested. Finally, we verify that the mock object was called with the correct method and arguments, using Mockito’s verify() method (again, statically imported). Here we verify that OutputCollector’s collect() method was called with a Text object representing the year (1950) and an IntWritable representing the temperature (−1.1°C).

Proceeding in a test-driven fashion, we create a Mapper implementation that passes the test (see Example ). Since we will be evolving the classes in this chapter, each is put in a different package indicating its version for ease of exposition. For example, v1.Max Temperature Mapper is version 1 of Max Temperature Mapper. In reality, of course, you would evolve classes without repackaging them.

This is a very simple implementation, which pulls the year and temperature fields from the line and emits them in the OutputCollector. Let’s add a test for missing values, which in the raw data are represented by a temperature of +9999:

Since records with missing temperatures should be filtered out, this test uses Mockito to verify that the collect method on the OutputCollector is never called for any Text key or IntWritable valuel.The existing test fails with a NumberFormatException, as parseInt() cannot parse integers with a leading plus sign, so we fix up the implementation (version 2) to handle missing values:

With the test passing, we move on to writing the reducer.


The reducer has to find the maximum value for a given key. Here’s a simple test for this feature:

We construct an iterator over some IntWritable values and then verify that Max Temperature Reducer picks the largest. The code for an implementation of MaxTemperatureReducer that passes the test. Notice that we haven’t tested the case of an empty values iterator, but arguably we don’t need to, since MapReduce would never call the reducer in this case, as every key produced by a mapper has a value.

Reducer for maximum temperature example

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Hadoop Topics