This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Apache MapReduce

  Note: The current upstream MapReduce 2.0 (MRv2 or YARN) release is not yet considered stable, and could potentially change in non-backwards-compatible ways. Thus the current release of MRv2 should not be considered production-ready. For more information about MRv2, see New Features.

— MapReduce applications from CDH3 must be recompiled in CDH4.

Users will need to recompile their applications when going from CDH3 to CDH4 (even to use MRv1). Note that once applications have been compiled with CDH4 libraries, they will not need to be recompiled to move from MRv1 to MRv2 (YARN) in CDH4.

Bug: None

Severity: Low

Anticipated resolution: None planned

Workaround: None

— Streaming jobs may not be recovered successfully when you use CDH4 MRv1 with Cloudera Manager 4.0.x

When job recovery is enabled (mapred.jobtracker.restart.recover is set to true), streaming jobs may not be recovered successfully if you are using CDH4 with Cloudera Manager 4.0.x.

Bug: None

Severity: Low

Resolution: None; use workaround.

Workaround: Use Cloudera Manager 4.1 or later and set Automatically Restart Process to false in Cloudera Manager.

— No JobTracker becomes active if both JobTrackers are migrated to other hosts

If JobTrackers in an High Availability configuration are shut down, migrated to new hosts, then restarted, no JobTracker becomes active. The logs show a Mismatched address exception.

Bug: None

Severity: Low

Workaround: After shutting down the JobTrackers on the original hosts, and before starting them on the new hosts, delete the ZooKeeper state using the following command:
$ zkCli.sh rmr /hadoop-ha/<logical name>

— Hadoop Pipes may not be usable in an MRv1 Hadoop installation done through tarballs

Under MRv1, MapReduce's C++ interface, Hadoop Pipes, may not be usable with a Hadoop installation done through tarballs unless you build the C++ code on the operating system you are using.

Bug: None

Severity: Medium

Resolution: None planned; use workaround.

Workaround: Build the C++ code on the operating system you are using. The C++ code is present under src/c++ in the tarball.

— Default port conflicts

By default, the Shuffle Handler (which runs inside the YARN NodeManager), the REST server, and many third-party applications, all use port 8080. This will result in conflicts if you deploy more than one of them without reconfiguring the default port.

Bug: MAPREDUCE-5036

Severity: Medium

Workaround: Make sure at most one service uses port 8080. To reconfigure the REST server, follow these instructions. To change the default port for the Shuffle Handler, set the value of mapreduce.shuffle.port in mapred-site.xml to an unused port.

— Task-completed percentage may be reported as slightly under 100% in the web UI, even when all of a job's tasks have successfully completed.

Bug: None

Severity: Low

Workaround: None

— Spurious warning in MRv1 jobs

The mapreduce.client.genericoptionsparser.used property is not correctly checked by JobClient and this leads to a spurious warning.

Bug: None

Severity: Low

Workaround: MapReduce jobs using GenericOptionsParser or implementing Tool can remove the warning by setting this property to true.

— Oozie workflows will not be recovered in the event of a JobTracker failover on a secure cluster

Delegation tokens created by clients (via JobClient#getDelegationToken()) do not persist when the JobTracker fails over. This limitation means that Oozie workflows will not be recovered successfully in the event of a failover on a secure cluster.

Bug: None

Severity: Medium

Workaround: Re-submit the workflow.

— Encrypted shuffle in MRv2 does not work if used with LinuxContainerExecutor and encrypted web UIs.

In MRv2, if the LinuxContainerExecutor is used (usually as part of Kerberos security), and hadoop.ssl.enabled is set to true (See Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport), then the encrypted shuffle does not work and the submitted job fails.

Bug: MAPREDUCE-4669

Severity: Medium

Workaround: Use encrypted shuffle with Kerberos security without encrypted web UIs, or use encrypted shuffle with encrypted web UIs without Kerberos security.

— Link from ResourceManager to Application Master does not work when the Web UI over HTTPS feature is enabled.

In MRv2 (YARN), if hadoop.ssl.enabled is set to true (use HTTPS for web UIs), then the link from the ResourceManager to the running MapReduce Application Master fails with an HTTP Error 500 because of a PKIX exception.

A job can still be run successfully, and, when it finishes, the link to the job history does work.

Bug: YARN-113

Severity: Low

Workaround: Don't use encrypted web UIs.

— Pipe jobs compiled against MRv1 cannot be run on MRv2 (YARN).

Hadoop Pipes jobs compiled against CDH versions prior to CDH4 will need to be recompiled against CDH4 YARN before you can run them on the new YARN framework. Also, pipes jobs compiled against CDH4 MRv1 will need to be recompiled against CDH4 YARN before you can run them on YARN.

Bug: MAPREDUCE-4090

Workaround: Job must be recompiled to run on MRv2.

— Hadoop client JARs don't provide all the classes needed for clean compilation of client code

The compile does succeed, but you may see warnings as in the following example:
 $ javac -cp '/usr/lib/hadoop/client/*' -d wordcount_classes WordCount.java
org/apache/hadoop/fs/Path.class(org/apache/hadoop/fs:Path.class): warning: Cannot find annotation method 'value()' in type 'org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
1 warning 
  Note: This means that the example at the bottom of the page on managing Hadoop API dependencies (see "Using the CDH4 Maven Repository" under CDH Version and Packaging Information will produce a similar warning.

Bug:

Severity: Low

Workaround: None

— The ulimits setting in /etc/security/limits.conf is applied to the wrong user if security is enabled.

Bug: DAEMON-192

Severity: Low

Resolution: None; use workaround

Workaround: To increase the ulimits applied to DataNodes, you must change the ulimit settings for the root user, not the hdfs user.

—Must set yarn.resourcemanager.scheduler.address to routable host:port when submitting a job from the ResourceManager

When you submit a job from the ResourceManager, yarn.resourcemanager.scheduler.address must be set to a real, routable address, not the wildcard 0.0.0.0.

Bug: None

Severity: Low

Resolution: None; use workaround

Workaround: Set the address, in the form host:port, either in the client-side configuration, or on the command line when you submit the job.