This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

HDFS High Availability Administration

HA Administration using the haadmin command

Now that your HA NameNodes are configured and started, you will have access to some additional commands to administer your HA HDFS cluster. Specifically, you should familiarize yourself with the subcommands of the hdfs haadmin command.

This page describes high-level uses of some important subcommands. For specific usage information of each subcommand, you should run hdfs haadmin -help <command>.

failover - initiate a failover between two NameNodes

This subcommand causes a failover from the first provided NameNode to the second. If the first NameNode is in the Standby state, this command simply transitions the second to the Active state without error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one of the methods succeeds. Only after this process will the second NameNode be transitioned to the Active state. If no fencing method succeeds, the second NameNode will not be transitioned to the Active state, and an error will be returned.

getServiceState

getServiceState - determine whether the given NameNode is Active or Standby

Connect to the provided NameNode to determine its current state, printing either "standby" or "active" to STDOUT appropriately. This subcommand might be used by cron jobs or monitoring scripts which need to behave differently based on whether the NameNode is currently Active or Standby.

checkHealth

checkHealth - check the health of the given NameNode

Connect to the provided NameNode to check its health. The NameNode is capable of performing some diagnostics on itself, including checking if internal services are running as expected. This command will return 0 if the NameNode is healthy, non-zero otherwise. One might use this command for monitoring purposes.

  Note:

The checkHealth command is not yet implemented, and at present will always return success, unless the given NameNode is completely down.

Using the dfsadmin command when HA is enabled

When you use the dfsadmin command with HA enabled, you should use the -fs option to specify a particular NameNode using the RPC address, or service RPC address, of the NameNode. Not all operations are permitted on a standby NameNode. If the specific NameNode is left unspecified, only the operations to set quotas (-setQuota, -clrQuota, -setSpaceQuota, -clrSpaceQuota), report basic file system information (-report), and check upgrade progress (-upgradeProgress) will failover and perform the requested operation on the active NameNode. The "refresh" options (-refreshNodes, -refreshServiceAcl, -refreshUserToGroupsMappings, and -refreshSuperUserGroupsConfiguration) must be run on both the active and standby NameNodes.

Switching from Shared Storage using NFS to Quorum-based Storage

To switch from shared storage using NFS to Quorum-based storage, proceed as follows:
  1. Disable your current HA configuration.
  2. Redeploy HA using Quorum-based storage.

Disabling HDFS High Availability

If you need to unconfigure HA and revert to using a single NameNode – permanently, or for testing purposes, or to switch to Quorum-based storage, proceed as follows:

Step 1: Shut Down the Cluster

  1. Shut down Hadoop services across your entire cluster. Do this from Cloudera Manager; or, if you are not using Cloudera Manager, run the following command on every host in your cluster:
    $ for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done
  2. Check each host to make sure that there are no processes running as the hdfs, yarn, mapred or httpfs users from root:
    # ps -aef | grep java

Step 2: Unconfigure HA

  1. Disable the software configuration. If you intend to redeploy the same HDFS HA configuration later, comment out the HA properties rather than deleting them.
  2. Move the NameNode metadata directories on the standby NameNode.

    The location of these directories is configured via dfs.namenode.name.dir and/or dfs.namenode.edits.dir. Move them to a backup location.

Step 3: Restart the Cluster

for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start ; done

Redeploying HDFS High Availability with Quorum-based Storage

If you need to redeploy HA after temporarily disabling it, or if you are switching to Quorum-based storage, proceed as follows:

  1. Shut down the cluster as described in Step 1 of Disabling HDFS High Availability.
  2. Depending on whether you were previously using Quorum-based storage, or are about to switch to this method, do one of the following:
  3. Start the Quorum Journal nodes, the primary NameNode, the standby NameNode, and the other cluster services, following the instructions under HDFS High Availability Initial Deployment.

Redeploying HDFS High Availability with Shared Storage using NFS

If you need to redeploy HA with NFS shared storage after temporarily disabling it, proceed as follows:

  1. Shut down the cluster as described in Step 1 of Disabling HDFS High Availability.
  2. Uncomment the properties you commented out in Step 2 of Disabling HDFS High Availability.
  3. Start the primary NameNode, the standby NameNode, and the other cluster services, following the instructions under HDFS High Availability Initial Deployment.