This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Apache HDFS

— Upgrading from CDH4 Beta 1 or Earlier Requires an HDFS Upgrade

Upgrading from CDH4 Beta 1 or CDH3 requires an HDFS Upgrade. See Upgrading from an Earlier CDH4 Release or Upgrading from CDH3 to CDH4 for further information. If High Availability is enabled, you must unconfigure High Availability before upgrading from CDH4 Beta 1. See Upgrading an HDFS HA Configuration to the Latest Release for further information.

Bug: HDFS-3133

Severity: Medium

Workaround: None

— bootstrapStandby may fail to read transactions when QJM is configured

After you provision a new Standby NameNode in an HA cluster, the Standby NameNode may fail at the command hdfs namenode -bootstrapStandby with the error Unable to read transaction ids 1-7 from the configured shared edits storage.

Symptoms:
% hdfs namenode -bootstrapStandby
12/08/23 17:08:53 FATAL ha.BootstrapStandby: Unable to read transaction ids 1-7 from the configured shared edits storage qjournal://qjm... Please copy these logs into the shared edits storage or call saveNamespace on the active node. 

Bug: HDFS-3752

Severity: Low

Workaround: Use rsync or a similar tool to copy the contents of the dfs.name.dir directory from the active namenode to the new Standby NameNode. Then start the Standby NameNode as normal.

— NameNode cannot use wildcard address in a secure cluster

In a secure cluster, you cannot use a wildcard for the NameNode's RPC or HTTP bind address. For example, dfs.namenode.http-address must be a real, routable address and port, not 0.0.0.0:<port>. This should affect you only if you are running a secure cluster and your NameNode needs to bind to multiple local addresses. In addition, it does not affect the client port and so should not affect ordinary clients, only administrative commands that are being used for external access (a rare case).

Bug: HDFS-4448

Severity: Medium

Workaround: None

— HDFS symlinks created by FileContext are not accessible using FileSystem

An HDFS symbolic link created using the FileContext API, when accessed by FileSystem, will result in an unhandled UnresolvedLinkException. HDFS symbolic links can be successfully accessed only by means of FileContext.

Bug: HADOOP-8040

Severity: Low

Workaround: None

— Permissions for dfs.namenode.name.dir incorrectly set.

Hadoop daemons should set permissions for the dfs.namenode.name.dir (or dfs.name.dir) directories to drwx------ (700), but in fact these permissions are set to the file-system default, usually drwxr-xr-x (755).

Bug: HDFS-2470

Severity: Low

Workaround: Use chmod to set permissions to 700. See Configuring Local Storage Directories for Use by HDFS for more information and instructions.

— The hdfs fsck and dfsadmin commands require that a HDFS file system be specified when federation is enabled

The hdfs fsck and dfsadmin commands do not work if fs.defaultFS is configured to use a viewfs file system. You must explicitly specify a particular HDFS file system using the -fs option.

Bug: HDFS-3483

Severity: Low

Workaround: Explicitly specify a particular HDFS file system using the -fs option

— The default setting of dfs.client.block.write.replace-datanode-on-failure.policy can cause an unrecoverable error in small clusters

The default setting of dfs.client.block.write.replace-datanode-on-failure.policy (DEFAULT) can cause an unrecoverable error in a small cluster during HBase rolling restart.

Bug: HDFS-5131

Severity: Medium

Workaround: Set dfs.client.block.write.replace-datanode-on-failure.policy to NEVER for 1- 2- or 3-node clusters, and leave it as DEFAULT for all other clusters. Leave dfs.client.block.write.replace-datanode-on-failure.enable set to true .

— hadoop fsck -move does not work in a cluster with host-based Kerberos

Bug: None

Severity: Low

Anticipated resolution: None; use workaround.

Workaround: Use hadoop fsck -delete

— HttpFS cannot get delegation token without prior authenticated request.

A request to obtain a delegation token cannot initiate an SPNEGO authentication sequence; it must be accompanied by an authentication cookie from a prior SPNEGO authentication sequence.

Bug: HDFS-3988

Severity: Low

Workaround: Make another WebHDFS request (such as GETHOMEDIR) to initiate an SPNEGO authentication sequence and then make the delegation token request.

— DistCp does not work between a secure cluster and an insecure cluster

Bug: HADOOP-10016, HADOOP-8828

Severity: High

Workaround: None

— DistCp from a secure CDH3 cluster using KSSL fails

The CDH4 Hftp client does not support using KSSL (Kerberized SSL) to authenticate to CDH3 clusters. KSSL support was removed from CDH4 because KSSL uses weak encryption types for Kerberos tickets; HTTP authentication to the NameNode now uses SPNEGO by default. Users of CDH3 releases with Kerberos security enabled must enable SPNEGO in order to support DistCp from a secure CDH3 cluster to a secure CDH4 cluster.

Bug: HDFS-3699

Severity: High

Resolution: None; use workaround.

Workaround: Switch the CDH3 cluster from KSSL to SPNEGO authentication

— Using DistCp with Hftp on a secure cluster using SPNEGO requires that the dfs.https.port property be configured

In order to DistCp using Hftp from a secure cluster using SPNEGO, you must configure the dfs.https.port property on the client to use the HTTP port (50070 by default).

Bug: HDFS-3983

Severity: Low

Workaround: Configure dfs.https.port to use the HTTP port on the client