The logging level of an individual MapReduce job in MRv1 can be set; see Tips and
Guidelines for details.
The performance of the MapReduce shuffle handler and IFile reader can be
improved for either MRv1 or YARN by using native Linux system calls that cache
data before the shuffle or merge operations. See Tips and
Guidelines for more details.
MapReduce performance improvements for MRv1, optimizations to reduce job latency
for small jobs. See Tips and
Guidelines for more details.
MapReduce jobs can query individual job status with
MapReduce job recovery for MRv1. If the Job Tracker is shutdown or crashes, on
restart it automatically resubmits all jobs that were running at the time of
shutdown or crash. All recovered jobs will be rerun from the beginning; all
output from the incomplete run is deleted before the re-submission.
Apache Hadoop Security using Kerberos
The log file names for Hadoop security using Kerberos have been changed to avoid
potential conflict: HDFS security logs are now written to
SecurityAuth-hdfs.audit while MapReduce security logs are
written to SecurityAuth-mapred.audit.
Update to base version of Flume 1.2.0
Major improvements to the file channel, including on-disk encryption
New, higher-throughput Asynchronous HBase sink
New, much faster, syslog TCP source capable of listening on many ports
Added exponential backoff behavior to failed nodes in load balancing RPC client
and Avro Sink.
Included "stock" interceptors, including those that annotate events with the
current hostname or timestamp
New monitoring support for JMX, Ganglia, and HTTP
Significantly expanded user documentation
Many other enhancements and fixes
Target directory for Hive import no longer needs to match the table name.
Microsoft SQL server connector and OraOop are now supported.
The --columns argument is now supported for exporting
Microsoft SQL table names that include hyphens are now supported.
Hue can now be configured so that users can only see Beeswax queries
that they issued or saved. With the default configuration, any Beeswax query can
be viewed by any user. The new share_saved_queries property now controls the sharing of the
queries; when set to "false", saved or executed queries can be viewed only by
the owner or a Hue administrator.
The Job Browser configuration now supports the share_jobs property which, when set
to "false", prevents a user from viewing information about jobs submitted by
other users; an administrator can view jobs for all users. The default behavior
allows all users to see all jobs.
Retired Jobs can now be viewed through the Job Browser in Hue. The information
is less complete than the information displayed for Recent Jobs.
Hue now provides an Oozie application for creating workflows of MapReduce,
streaming, Java, Pig, Hive, Sqoop, Shell and ssh jobs and scheduling them
Hue is now available in German, Spanish, French, Japanese, Korean, Portuguese,
Brazilian and simplified Chinese.
LIMIT/SAMPLE operators can take expressions other than constant value. PIG-1926
Default SPLIT destination can be specified by OTHERWISE keyword. PIG-1904
Syntactical sugar for TOTUPLE, TOBAG, and TOMAP is added. PIG-1387
AvroStorage now supports globs and commas. PIG-2492
AvroStorage now supports recursive records. PIG-2875
CDH4.1 includes the DataFu collection of Useful Apache Pig UDFs (User-Defined
Functions) for statistical analysis. See Pig
Installation for installation instructions.
Oozie workflow, coordinator and bundle XML definitions 0.4 support a
parameters element defining the expected job parameters and
default values, if any. If present, the parameters element
enables an early verification of the submitted job configuration.
Oozie workflow XML definition 0.4 supports a global
configuration section which is inherited by all actions. This
global section can be used to define common key/values
across actions in the workflow such as the job-tracker URI, the name-node URI
and configuration properties. Values defined at the action level have precedence
over global values.
Oozie workflow XML definition 0.4 supports multiple job-xml
elements in action definitions. If an action has multiple
job-xml elements, the property key/values of those
configuration files are loaded to the action configuration; if a property key
occurs in multiple configuration files, the last occurrence has precedence.
Oozie workflow MapReduce actions now support über JARs. The über JAR must be
specified in the MapReduce action configuration section using the
oozie.mapreduce.uber.jar property. In order to user über
JARs, the Oozie server must be configured first by setting the
oozie.action.mapreduce.uber.jar.enable property to
true in the oozie-site.xml.
Oozie logs now self purge (by default after 30 days). Oozie logs can also be
GZIPped when rolled.
Oozie workflow control nodes (start/end/fork/join/kill) now are
treated as "action" nodes and they show up in the list of executed workflow
Oozie now supports alternate share libraries. This enables the use of alternate
sets of JARs for a given action type. Alternate share libraries can be
configured at server level, job level and action level.
Oozie filesystem action now supports touchz (to create a file
of zero length) and recursive chmod operations.
Oozie now supports submission of MapReduce jobs, without having to write a
workflow, using the Oozie mapreduce subcommand.
CDH4.1 upgrades Hive from version 0.8.1 to version 0.9.0. The new version of
Hive includes approximately 150 bug fixes and feature enhancements not found in
the previous version.
HIVE-2935 adds HiveServer2, an improved version of
HiveServer that supports a new Thrift API tailored to JDBC and ODBC clients,
Kerberos authentication, and multi-client concurrency. This patch also adds a
new JDBC driver designed to run on top of HiveServer2, and a new CLI for
HiveServer2 named BeeLine.
HIVE-3277 adds MetaStore audit logging for all
connection types, both secure and non-secure.
HIVE-2957 improves the JDBC driver's support for
TIMESTAMP column types.
HIVE-3056 adds the metatool utility
which facilitates bulk updates of metastore catalog records.
Many bug fixes and robustness enhancements to HBase 0.92.
The K-Means, Fuzzy K-Means, Canopy, and Dirichlet algorithms have been
reimplemented to use ClusterClassificationDriver to refactor
clustering with outlier pruning support. See MAHOUT-981, MAHOUT-984, MAHOUT-982, and MAHOUT-983.