This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Deploying HDFS on a Cluster

Copying the Hadoop Configuration

To customize the Hadoop configuration:

  1. Copy the default configuration to your custom directory:
    $ sudo cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster

    You can call this configuration anything you like; in this example, it's called my_cluster.

  2. Set alternatives to point to your custom directory, as follows.

    To manually set the configuration on Red Hat-compatible systems:

    $ sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 
    $ sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

    To manually set the configuration on Ubuntu and SLES systems:

    $ sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    $ sudo update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

    For more information on alternatives, see the update-alternatives(8) man page on Ubuntu and SLES systems or the alternatives(8) man page On Red Hat-compatible systems.

  Important:

When performing the configuration tasks in this section, and when you go on to deploy MRv1 or YARN, edit the configuration files in your custom directory (for example /etc/hadoop/conf.my_cluster). Do not create your custom configuration in the default directory /etc/hadoop/conf.dist.

Customizing Configuration Files

The following tables show the most important properties that you must configure for your cluster.

  Note:

For information on other important configuration properties, and the configuration files, see the Apache Cluster Setup page.

Property

Configuration File

Description

fs.defaultFS

conf/core-site.xml

Note: fs.default.name is deprecated. Specifies the NameNode and the default file system, in the form hdfs://<namenode host>:<namenode port>/. The default value is file///. The default file system is used to resolve relative paths; for example, if fs.default.name or fs.defaultFS is set to hdfs://mynamenode/, the relative URI /mydir/myfile resolves to hdfs://mynamenode/mydir/myfile. Note: for the cluster to function correctly, the <namenode> part of the string must be the hostname (for example mynamenode) not the IP address.

dfs.permissions.superusergroup

conf/hdfs-site.xml

Specifies the UNIX group containing users that will be treated as superusers by HDFS. You can stick with the value of hadoop or pick your own group depending on the security policies at your site.

The hdfs, yarn, and mapred users belong to the hadoop group. To give users root privileges in HDFS, create a UNIX group with the same name as this group (or change the value of the configuration to correspond to an existing UNIX group) and add them to the group. The impala user also belongs to the hdfs group.

Sample Configuration

core-site.xml:

<property>
 <name>fs.defaultFS</name>
 <value>hdfs://namenode-host.company.com/</value>
</property>

hdfs-site.xml:

<property>
 <name>dfs.permissions.superusergroup</name>
 <value>hadoop</value>
</property>

Configuring Local Storage Directories

You need to specify, create, and assign the correct permissions to the local directories where you want the HDFS daemons to store data. You specify the directories by configuring the following two properties in the hdfs-site.xml file.

Property

Configuration File Location

Description

dfs.name.dir or dfs.namenode.name.dir

hdfs-site.xml on the NameNode

This property specifies the URIs of the directories where the NameNode stores its metadata and edit logs. Cloudera recommends that you specify at least two directories. One of these should be located on an NFS mount point, unless you will be using a High Availability (HA) configuration.

  Note: Cloudera recommends you specify a directory which will not exist if the NFS mount point is not present. This will prevent the NameNode from filling up the local filesystem when NFS fails to mount at all.

dfs.data.dir or dfs.datanode.data.dir

hdfs-site.xml on each DataNode

This property specifies the URIs of the directories where the DataNode stores blocks. Cloudera recommends that you configure the disks on the DataNode in a JBOD configuration, mounted at /data/1/ through /data/N, and configure dfs.data.dir or dfs.datanode.data.dir to specify file:///data/1/dfs/dn through file:///data/N/dfs/dn/.

  Note:

dfs.data.dir and dfs.name.dir are deprecated; you should use dfs.datanode.data.dir and dfs.namenode.name.dir instead, though dfs.data.dir and dfs.name.dir will still work.

Sample configuration:

hdfs-site.xml on the NameNode:

<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:///data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>

hdfs-site.xml on each DataNode:

<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value>
</property>

After specifying these directories as shown above, you must create the directories and assign the correct file permissions to them on each node in your cluster.

In the following instructions, local path examples are used to represent Hadoop parameters. Change the path examples to match your configuration.

Local directories:

  • The dfs.name.dir or dfs.namenode.name.dir parameter is represented by the /data/1/dfs/nn and /nfsmount/dfs/nn path examples.
  • The dfs.data.dir or dfs.datanode.data.dir parameter is represented by the /data/1/dfs/dn, /data/2/dfs/dn, /data/3/dfs/dn, and /data/4/dfs/dn path examples.

To configure local storage directories for use by HDFS:

  1. On a NameNode host: create the dfs.name.dir or dfs.namenode.name.dir local directories:
    $ sudo mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
      Important:

    If you are using High Availability (HA), you should not configure these directories on an NFS mount; configure them on local storage.

  2. On all DataNode hosts: create the dfs.data.dir or dfs.datanode.data.dir local directories:
    $ sudo mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
  3. Configure the owner of the dfs.name.dir or dfs.namenode.name.dir directory, and of the dfs.data.dir or dfs.datanode.data.dir directory, to be the hdfs user:
    $ sudo chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
    Here is a summary of the correct owner and permissions of the local directories:

    Directory

    Owner

    Permissions (see Footnote 1)

    dfs.name.dir or dfs.namenode.name.dir

    hdfs:hdfs

    drwx------

    dfs.data.dir or dfs.datanode.data.dir

    hdfs:hdfs

    drwx------

    Footnote: 1 The Hadoop daemons automatically set the correct permissions for you on dfs.data.dir or dfs.datanode.data.dir. But in the case of dfs.name.dir or dfs.namenode.name.dir, permissions are currently incorrectly set to the file-system default, usually drwxr-xr-x (755). Use the chmod command to reset permissions for these dfs.name.dir or dfs.namenode.name.dir directories to drwx------ (700); for example:
    $sudo chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
    or
    $sudo chmod go-rx /data/1/dfs/nn /nfsmount/dfs/nn
      Note:

    If you specified nonexistent directories for the dfs.data.dir or dfs.datanode.data.dir property in the conf/hdfs-site.xml file, CDH4 will shut down. (In previous releases, CDH3 silently ignored nonexistent directories for dfs.data.dir.)

Configuring DataNodes to Tolerate Local Storage Directory Failure

By default, the failure of a single dfs.data.dir or dfs.datanode.data.dir will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas for each block that is present on the DataNode. This causes needless replications of blocks that reside on disks that have not failed.

To prevent this, you can configure DataNodes to tolerate the failure of dfs.data.dir or dfs.datanode.data.dir directories; use the dfs.datanode.failed.volumes.tolerated parameter in hdfs-site.xml. For example, if the value for this parameter is 3, the DataNode will only shut down after four or more data directories have failed. This value is respected on DataNode startup; in this example the DataNode will start up as long as no more than three directories have failed.

  Note:

It is important that dfs.datanode.failed.volumes.tolerated not be configured to tolerate too many directory failures, as the DataNode will perform poorly if it has few functioning data directories.

Formatting the NameNode

Before starting the NameNode for the first time you need to format the file system.

  Important:
  • Make sure you format the NameNode as user hdfs.
  • If you are re-formatting the NameNode, keep in mind that this invalidates the DataNode storage locations, so you should remove the data under those locations after the NameNode is formatted.
$ sudo -u hdfs hdfs namenode -format
  Note:

If Kerberos is enabled, do not use commands in the form sudo -u <user> <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>

You'll get a confirmation prompt; for example:

Re-format filesystem in /data/namedir ? (Y or N)
 

Respond with an upper-case Y; if you use lower case, the process will abort.

Configuring a Remote NameNode Storage Directory

You should configure the NameNode to write to multiple storage directories, including one remote NFS mount. To keep NameNode processes from hanging when the NFS server is unavailable, configure the NFS mount as a soft mount (so that I/O requests that time out fail rather than hang), and set other options as follows:

tcp,soft,intr,timeo=10,retrans=10

These options configure a soft mount over TCP; transactions will be retried ten times (retrans=10) at 1-second intervals (timeo=10) before being deemed to have failed.

Example:

mount -t nfs -o tcp,soft,intr,timeo=10,retrans=10, <server>:<export> <mount_point>

where <server> is the remote host, <export> is the exported file system, and <mount_point> is the local mount point.

  Note:

Cloudera recommends similar settings for shared HA mounts, as in the example that follows.

Example for HA:

mount -t nfs -o tcp,soft,intr,timeo=50,retrans=12, <server>:<export> <mount_point>

Note that in the HA case timeo should be set to 50 (five seconds), rather than 10 (1 second), and retrans should be set to 12, giving an overall timeout of 60 seconds.

For more information, see the man pages for mount and nfs.

Configuring Remote Directory Recovery

You can enable the dfs.namenode.name.dir.restore option so that the NameNode will attempt to recover a previously failed NameNode storage directory on the next checkpoint. This is useful for restoring a remote storage directory mount that has failed because of a network outage or intermittent NFS failure.

Configuring the Secondary NameNode

  Important:

The Secondary NameNode does not provide failover or High Availability (HA). If you intend to configure HA for the NameNode, skip this section: do not install or configure the Secondary Name Node (the Standby NameNode performs checkpointing). After completing the software configuration for your chosen HA method, follow the installation instructions under HDFS High Availability Initial Deployment.

In non-HA deployments, configure a Secondary NameNode that will periodically merge the EditLog with the FSImage, creating a new FSImage which incorporates the changes which were in the EditLog. This reduces the amount of disk space consumed by the EditLog on the NameNode, and also reduces the restart time for the Primary NameNode.

A standard Hadoop cluster (not a Hadoop Federation or HA configuration), can have only one Primary NameNode plus one Secondary NameNode. On production systems, the Secondary NameNode should run on a different machine from the Primary NameNode to improve scalability (because the Secondary NameNode does not compete with the NameNode for memory and other resources to create the system snapshot) and durability (because the copy of the metadata is on a separate machine that is available if the NameNode hardware fails).

Configuring the Secondary NameNode on a Separate Machine

To configure the Secondary NameNode on a separate machine from the NameNode, proceed as follows.

  1. Add the name of the machine that will run the Secondary NameNode to the conf/masters file.
  2. Add the following property to the hdfs-site.xml file:
    <property>
      <name>dfs.namenode.http-address</name>
      <value><namenode.host.address>:50070</value>
      <description>
        The address and the base port on which the dfs NameNode Web UI will listen.
      </description>
    </property>
      Note:
    • dfs.http.address is deprecated; use dfs.namenode.http-address.
    • In most cases, you should set dfs.namenode.http-address to a routable IP address with port 50070. However, in some cases such as Amazon EC2, when the NameNode should bind to multiple local addresses, you may want to set dfs.namenode.http-address to 0.0.0.0:50070on the NameNode machine only, and set it to a real, routable address on the Secondary NameNode machine. The different addresses are needed in this case because HDFS uses dfs.namenode.http-address for two different purposes: it defines both the address the NameNode binds to, and the address the Secondary NameNode connects to for checkpointing. Using 0.0.0.0 on the NameNode allows the NameNode to bind to all its local addresses, while using the externally-routable address on the the Secondary NameNode provides the Secondary NameNode with a real address to connect to.

For more information, see Multi-host SecondaryNameNode Configuration.

More about the Secondary NameNode

  • The NameNode stores the HDFS metadata information in RAM to speed up interactive lookups and modifications of the metadata.
  • For reliability, this information is flushed to disk periodically. To ensure that these writes are not a speed bottleneck, only the list of modifications is written to disk, not a full snapshot of the current filesystem. The list of modifications is appended to a file called edits.
  • Over time, the edits log file can grow quite large and consume large amounts of disk space.
  • When the NameNode is restarted, it takes the HDFS system state from the fsimage file, then applies the contents of the edits log to construct an accurate system state that can be loaded into the NameNode's RAM. If you restart a large cluster that has run for a long period with no Secondary NameNode, the edits log may be quite large, and so it can take some time to reconstruct the system state to be loaded into RAM.

When the Secondary NameNode is configured, it periodically (once an hour, by default) constructs a checkpoint by compacting the information in the edits log and merging it with the most recent fsimage file; it then clears the edits log. So, when the NameNode restarts, it can use the latest checkpoint and apply the contents of the smaller edits log.

Secondary NameNode Parameters

The behavior of the Secondary NameNode is controlled by the following parameters in {hdfs-site.xml}}.

  • dfs.namenode.checkpoint.check.period
  • dfs.namenode.checkpoint.txns
  • dfs.namenode.checkpoint.dir
  • dfs.namenode.checkpoint.edits.dir
  • dfs.namenode.num.checkpoints.retained

See http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml for details.

Enabling Trash

  Important:

The trash feature is disabled by default. Cloudera recommends that you enable it on all production clusters.

The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. If trash checkpointing is enabled, the Current directory is periodically renamed using a timestamp. Files in .Trash are permanently removed after a user-configurable time delay. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.

  Note:

The trash feature works by default only for files and directories deleted using the Hadoop shell. Files or directories deleted programmatically using other interfaces (WebHDFS or the Java APIs, for example) are not moved to trash, even if trash is enabled, unless the program has implemented a call to the trash functionality. (Hue, for example, implements trash as of CDH 4.4.)

Users can bypass trash when deleting files using the shell by specifying the -skipTrash option to the hadoop fs -rm -r command. This can be useful when it is necessary to delete files that are too large for the user's quota.

Trash is configured with the following properties in the core-site.xml file:

CDH4 Parameter

Value

Description

fs.trash.interval

minutes or 0

The number of minutes after which a trash checkpoint directory is deleted. This option can be configured both on the server and the client; in releases prior to CDH4.1 this option could be configured only on the client.

  • If trash is enabled in the server configuration, then the value configured on the server is used and the client configuration is ignored.
  • If trash is disabled in the server configuration, then the client side configuration is checked.
  • If the value of this property is zero (the default), then the trash feature is disabled.

fs.trash.checkpoint.interval

minutes or 0

The number of minutes between trash checkpoints. Every time the checkpointer runs on the NameNode, it creates a new checkpoint of the "Current" directory and removes checkpoints older than fs.trash.interval minutes. This value should be smaller than or equal to fs.trash.interval. This option is configured on the server. If configured to zero (the default), then the value is set to the value of fs.trash.interval.

For example, to enable trash so that files deleted using the Hadoop shell are not deleted for 24 hours, set the value of the fs.trash.interval property in the server's core-site.xml file to a value of 1440.
  Note:

The period during which a file remains in the trash starts when the file is moved to the trash, not when the file is last modified.

Configuring Storage-Balancing for the DataNodes

You can configure HDFS to distribute writes on each DataNode in a manner that balances out available storage among that DataNode's disk volumes.

By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. As of CDH4.3, you can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.

You can configure
  • how much DataNode volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced, and
  • what percentage of new block allocations will be sent to volumes with more available disk space than others.
To configure storage balancing, set the following properties in hdfs-site.xml.
  Note: Keep in mind that if usage is markedly imbalanced among a given DataNode's storage volumes when you enable storage balancing, throughput on that DataNode will be affected initially, as writes are disproportionately directed to the under-utilized volumes.

Property

Value

Description

dfs.datanode.fsdataset. volume.choosing.policy org.apache.hadoop.hdfs.server. datanode.fsdataset. AvailableSpaceVolumeChoosingPolicy

Enables storage balancing among the DataNode's volumes.

dfs.datanode.available-space- volume-choosing-policy. balanced-space-threshold 10737418240 (default)

The amount by which volumes are allowed to differ from each other in terms of bytes of free disk space before they are considered imbalanced. The default is 10737418240 (10 GB).

If the free space on each volume is within this range of the other volumes, the volumes will be considered balanced and block assignments will be done on a pure round-robin basis.

dfs.datanode.available-space- volume-choosing-policy. balanced-space-preference-fraction 0.75 (default) What proportion of new block allocations will be sent to volumes with more available disk space than others. The allowable range is 0.0-1.0, but set it in the range 0.5 - 1.0 (that is, 50-100%), since there should be no reason to prefer that volumes with less available disk space receive more block allocations.

Enabling WebHDFS

If you want to use WebHDFS, you must first enable it.

To enable WebHDFS:

Set the following property in hdfs-site.xml:

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
  Note:

If you want to use WebHDFS in a secure cluster, you must set additional properties to configure secure WebHDFS. For instructions, see Configure secure WebHDFS.

Configuring LZO

If you have installed LZO, configure it as follows.

To configure LZO:

Set the following property in core-site.xml.
  Note:

If you copy and paste the value string, make sure you remove the line-breaks and carriage returns, which are included below because of page-width constraints.

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

For more information about LZO, see Using LZO Compression.

Deploy MRv1 or YARN

The next step is to deploy MRv1 or YARN and start HDFS services. See