This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

What's New in CDH4.5.0

Apache Flume

New Features:

  • FLUME-2190 - Included a new Twitter Source that feeds off the Twitter firehose.
  • FLUME-2109 - HTTP Source now supports HTTPS.
  • FLUME-1666 - Syslog TCP Source can now keep timestamp and process fields in the event body.
  • FLUME-2202 - AsyncHBaseSink can now coalesce increments to the same row and column per transaction to reduce the number of RPC calls.
  • FLUME-2189 - Avro Source can now accept events from a restricted set of peers.
  • FLUME-2052 - Spooling Directory Source can now ignore or replace malformed characters.
  • Flume auto-detects Cloudera Search dependencies.

Changed Feature:

  • FLUME-2233 - Memory Channel calculates byte capacity usage on transaction commits instead of puts to improve performance.

Apache Hive

New Feature:

Hue

New Features:
  • Added support for SAML authentication backend and other security fixes.
Changed Features:
  • HUE-1609 - [core] LDAP backend and import should be case insensitive.
  • HUE-1632 - [oozie] Workflow with & in a property fails to submit.
  • HUE-1555 - [hbase] Python 2.4 support.
  • HUE-1521 - [core] Improve JobTracker HA.
  • [search] Default template should display all the fields.
  • [core] Make search bind authentication optional for LDAP.

Apache MapReduce v1 (MRv1)

New Features:
  • Track HDFS accesses: An MRv1 job keeps track of HDFS tokens used by it for accessing HDFS data when mapreduce.job.token.tracking.ids is set to true. Further, the HDFS audit logs capture information on jobs accessing data.
  • Stack traces on task-timeout: For easy debugging, MRv1 tasks dump their stack traces on timeout.
  • KeyOnlyTextInputWriter and KeyOnlyTextOutputReader enable streaming jobs to write/read text without separators.
Changed Feature
  • Users no longer need to set environment variables differently when using the scripts under the bin-mapreduce1 directory in MRv1 tarballs.

Apache MapReduce v2 (YARN)

New Features:
  • Track HDFS accesses: A job keeps track of HDFS tokens used by it for accessing HDFS data, when mapreduce.job.token.tracking.ids is set to true. Further, the HDFS audit logs capture information on jobs accessing data.
  • KeyOnlyTextInputWriter and KeyOnlyTextOutputReader enable streaming jobs to write/read text without separators.
  • The Fair Scheduler can now be configured to decouple decisions from node heartbeats, resulting in faster scheduling.

Apache Oozie

New Feature:
  • The Pig and Hive actions can now access Parquet files with no manual steps or configuration needed.

Apache Sentry (incubating)

New Features:

  • Access to the Hive Metastore Service can be secured without IPTables. To restrict access to the Hive Metastore Service to only the users that HiveServer2 and ImpalaD run as, these users need to be added to core-site.xml.

    In the example below, hivemetastore is the user that Hive Metastore Service runs as. hive and impala are users that HiveServer2 and ImpalaD run as respectively. These users will now be allowed to connect to the Hive Metastore Service.

    <property>
     <name>hadoop.proxyuser.hivemetastore.groups</name>
     <value>hive, impala</value>
     </property>
  • Sentry is now integrated with Cloudera Search. See Configuring Sentry for Search for more information.