This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

What's New in CDH4 Beta 2

Apache Hadoop Common

  • Hadoop network topology may now be specified in a configuration file (HADOOP-7030).

Apache Hadoop HDFS

  • Automatic Failover for HDFS High Availability is now available. When enabled, the standby automatically becomes active without administrator intervention if the active NameNode fails. See the CDH4 High Availability Guide for more information on how to configure and deploy automatic failover.
  • The HDFS High Availability (HA) feature is now supported when Kerberos security is enabled. See the CDH4 Security Guide for instructions on how to enable HA with Kerberos security.

Apache Hadoop MapReduce

  • In CDH4 Beta 2, if you install from packages, the default is to deploy the original MapReduce framework (MRv1). If you are upgrading from CDH3, you must still recompile your CDH3 MapReduce applications to run the CDH4 version of MRv1. You can also choose to install and deploy MapReduce 2.0 (MRv2), built on the YARN framework. Once you have recompiled your applications to run in CDH4, you can move them from MRv1 to MRv2 without further recompilation. Cloudera does not support running MRv1 and YARN daemons on the same nodes at the same time; it will degrade performance and may result in an unstable cluster deployment. Note that if you install CDH4 Beta 2 from the hadoop-0.23.1 tarball rather than a package, it will install the YARN framework components only.

Important: The current upstream MRv2 release is not yet considered stable, and could potentially change in non-backwards-compatible ways. Thus the current release of MRv2 should not be considered production-ready.

See the information under What's New in CDH4 Beta 1, specifically the section on MapReduce for more information about the two MapReduce frameworks and their differences.

Apache Flume

  • Flume 1.1.0 Flume 1.1.0 represents a significant refactoring from the previous version, Flume 9.x. The goal has been an easy to use, easy to extend package. Flume 1.1.0 retains the general approach to data transfer and handling. In Flume 1.1.0, as Flume events are received by a source, the source stores it into a channel, which is a passive store that keeps the event until it is consumed by a flume sink. A Flume agent is a process that hosts the components (sources, channels, sinks) through which these events flow. The source and sink within the given agent run asynchronously with the events staged in the channel. With Flume 1.x there is no longer a Flume Master, and there is no dependency on ZooKeeper. Thrift and Avro legacy Flume sources are provided to enable sending events from Flume 0.9.4 to Flume 1.1.0. For more details of Flume architecture and user interface, please refer to Apache Flume Wiki.
  • Flume 0.9.x For information on Flume 0.9.x, see the Flume 0.9.x documentation. To install Flume 0.9.x instead of Flume 1.x, go to You cannot install both Flume 0.9.x and Flume 1.x together on the same host.

Apache HBase

  • Updated to upstream version 0.92.1

New Features:

Apache Hive

Updated to upstream version 0.8.1

Incorporates table/partition properties improvements (HIVE-2589)


Hue 2.0.0

  • The front end has been re-implemented as full screen pages. Whole page loading has replaced the old "desktop" model. In this process, the new front end is rewritten with jQuery and Bootstrap which fixes memory leaks in older browsers. (HUE-585)
  • Hue accesses HDFS via WebHDFS or HttpFS. It no longer requires a plug-in on the NameNode or DataNodes. (HUE-610)
  • Hue submits MapReduce jobs via Oozie. This fixes a security hole in the previous scheme of Hue directly running the client job jar. (HUE-611)
  • Hue supports LDAP (OpenLDAP and Active Directory). Hue can be configured to authenticate against LDAP. Additionally, Hue can import users and groups from LDAP, and refresh group membership from LDAP. (HUE-607, HUE-614, HUE-615)
  • Hue supports per-application authorization. Administrators can grant or limit group access to applications. (HUE-608)
  • An on/off switch has been added to either restrict access to the personal jobs of the user or to keep all the jobs accessible to everybody (previous behavior).
  • A switch has been added to Beeswax to restrict other users' access to personal saved queries (HUE-688, HUE-701).

Apache Mahout

  • Updated to upstream version 0.6

New features:

Apache Oozie

  • Updated to upstream version 3.1.3

New features:

  • Support for Bundled jobs — allow management of multiple coordinator jobs as a single job
  • Support for Hadoop MRv2
  • Oozie share library per action, avoids conflicts among action JARs
  • Support for proxyuser, enabling an Oozie user to submit jobs on behalf of other users
  • Workflow fork/join pairing validation
  • Provides a Database creation/migration tool
  • Coordinator jobs lifecycle has been redesigned and it is more robust
  • Several internal components have been rewritten
  • Better scalability due to command de-duping

Apache Sqoop

  • Updated to upstream version 1.4.1

New features:

  • Hadoop MRv2 compatibility (SQOOP-397)
  • Hadoop 1.0 compatibility (SQOOP-420)
  • Customized environment variables (SQOOP-438)
  • Customized type mapping for AVRO format (SQOOP-362)
  • Compression support for AVRO format (SQOOP-428)
  • Incremental support for free form queries (SQOOP-444)
  • MySQL YEAR data type support (SQOOP-352)

Apache Whirr

  • Updated to upstream version 0.7.1
  • Support for YARN and MR2 (WHIRR-391)

Apache ZooKeeper