- Use a Map function to break-down original (large) data sets into key/value pairs. These are the intermediate key/value pairs.
- Use a Reduce function combine the intermediate key/value pairs to arrive at the final answer
Example
Objective
Produce a word count from a large text corpusMap Step
- Iterate through all documents in the corpus. For each document:
- Parse words
- Create a key (word) /value (count, 1) pairs for each word found
Reduce Step
- Sum together counts for all intermediate keys (words) to arrive at count for each word in the corpus
See: http://hadoop.apache.org/mapreduce/