MapReduce has undergone a complete overhaul and CDH4 now includes MapReduce 2.0 (MRv2). The
fundamental idea of MRv2's YARN architecture is to split up the two primary
responsibilities of the JobTracker — resource management and job scheduling/monitoring —
into separate daemons: a global ResourceManager (RM) and per-application
ApplicationMasters (AM). With MRv2, the ResourceManager (RM) and per-node NodeManagers
(NM), form the data-computation framework. The ResourceManager service effectively
replaces the functions of the JobTracker, and NodeManagers run on slave nodes instead of
TaskTracker daemons. The per-application ApplicationMaster is, in effect, a framework
specific library and is tasked with negotiating resources from the ResourceManager and
working with the NodeManager(s) to execute and monitor the tasks. For details of the new
architecture, see Apache Hadoop NextGen MapReduce
Cloudera does not consider the current upstream MRv2 release
stable yet, and it could potentially change in non-backwards-compatible ways.
Cloudera recommends that you use MRv1 unless you have particular reasons for using
MRv2, which should not be considered production-ready.
For more information about the two implementations (MRv1 and MRv2) see the discussion
under Apache Hadoop MapReduce in the "What's New in Beta 1" section of New Features in CDH4.
See also Selecting Appropriate JAR files for your MRv1 and YARN Jobs.
For installations in pseudo-distributed mode, there are separate
conf-pseudo packages for an installation that includes MRv1
(hadoop-0.20-conf-pseudo) or an installation that includes YARN
(hadoop-conf-pseudo). Only one conf-pseudo
package can be installed at a time: if you want to change from one to the other, you
must uninstall the one currently installed.