This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Apache Hadoop HDFS

  • HADOOP-10087: UserGroupInformation.getGroupNames() fails to return primary group first in the list when JniBasedUnixGroupsMappingWithFallback is used.
  • HDFS-2727 changed libhdfs so that it now uses the NameNode's block size configuration rather than the deprecated dfs.block.size client configuration.
  • HDFS-1703 changed the mechanism used to indicate which host the Secondary NameNode (2NN) should be started on. The masters file is no longer used, and instead the 2NN is started from the script unconditionally. This only affects tarball-based installations; installations that use CDH packaging are unaffected.
  • HDFS-347 changed the short-circuit local read mechanism. Short-circuit local reads are no longer supported on tarball-based installations; installations that use CDH packaging are unaffected.
  • The HdfsProxy contrib component was removed, and is replaced by HttpFs.
  • The ThriftFs contrib component was removed.
  • copyFromLocal in the fsShell has changed between CDH3 and CDH4, and as a result code that does any of the following will break:
    • Relies on passing unknown options or too many arguments
    • Relies on non-existent directories being auto-created
    • Uses remote path arguments that contain unescaped glob meta-characters
    • Relies on relative paths being automatically converted to a fully qualified URI
    • Attempts to parse error messages (may break)
    • Checks for specific exit codes instead of equal to, less than, or greater than 0.
    For more information see this document.
  • HDFS-4305 introduces a configurable maximum number of blocks per file, by default one million, and a minimum block size, by default 1 MB. You can change these via the configuration settings dfs.namenode.fs-limits.max-blocks-per-file and dfs.namenode.fs-limits.min-block-size, respectively.

    This change could affect you if you use a block size smaller than 1 MB, for example in testing.

High-Availability Implementation Changing

CDH4 supports two high-availability (HA) implementations: shared storage using NFS and Quorum-based storage. Cloudera recommends you use Quorum-based storage. If you are currently using shared storage using NFS, and want to switch to Quorum-based storage, see Switching from Shared Storage using NFS to Quorum-based Storage. In CDH5, Quorum-based storage is the only supported HA implementation.