This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Apache Hadoop MapReduce

  Important: As of CDH4.3, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball itself. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory. You need to do some additional configuration; follow the directions below.
To use MRv1 from a tarball installation, proceed as follows:
  1. Extract the files from the tarball.
      Note: In the steps that follow, install_dir is the name of the directory into which you extracted the files.
  2. Create a symbolic link as follows:
    ln -s install_dir/bin-mapreduce1 install_dir/share/hadoop/mapreduce1/bin
  3. Create a second symbolic link as follows:
    ln -s install_dir/etc/hadoop-mapreduce1 install_dir/share/hadoop/mapreduce1/conf
  4. Set the HADOOP_HOME and HADOOP_CONF_DIR environment variables in your execution environment as follows:
    $ export HADOOP_HOME=install_dir/share/hadoop/mapreduce1 
    $ export HADOOP_CONF_DIR=$HADOOP_HOME/conf 
  5. Copy your existing start-dfs.sh and stop-dfs.sh scripts to install_dir/bin-mapreduce1
  6. For convenience, add install_dir/bin to the PATH variable in your execution environment .

The following incompatible changes occurred between CDH3.x and CDH4.x:

  • MAPREDUCE-954 changed some context classes to interfaces (e.g. JobContext, MapContext, TaskAttemptContext, TaskInputOutputContext). This change should not impact user code (since such code does not implement these interfaces) although it does mean that user code (including libraries like Pig) must be recompiled.
  • MAPREDUCE-901 changes Counter from a class to an interface. Clients must be recompiled.
  • MAPREDUCE-4053. The getGroupNames() method on org.apache.hadoop.mapred.Counters returns the new built-in counter group names, not the old ones (see table below). Applications that expect old counter groups names may need updating to look for the new name. Methods that take a group name (like getGroup() on Counters) automatically map the old name to the new name, so applications do not need to be modified if they use these methods.

Old group name

New group name

org.apache.hadoop.mapred.Task$Counter

org.apache.hadoop.mapreduce.TaskCounter

org.apache.hadoop.mapred.JobInProgress$Counter

org.apache.hadoop.mapreduce.JobCounter

FileSystemCounters

org.apache.hadoop.mapreduce.FileSystemCounter