This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Hardware Configuration for HDFS HA

This section describes the hardware configuration required for each of the two implementations:

Hardware Configuration for Quorum-based Storage

In order to deploy an HA cluster using Quorum-based Storage, you should prepare the following:
  • NameNode machines - the machines on which you run the Active and Standby NameNodes should have equivalent hardware to each other, and equivalent hardware to what would be used in a non-HA cluster.
  • JournalNode machines - the machines on which you run the JournalNodes.
  • The JournalNode daemon is relatively lightweight, so these daemons can reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.
  • Cloudera recommends that you deploy the JournalNode daemons on the "master" host or hosts (NameNode, Standby NameNode, JobTracker, etc.) so the JournalNodes' local directories can use the reliable local storage on those machines. You should not use SAN or NAS storage for these directories.
  • There must be at least three JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. This will allow the system to tolerate the failure of a single machine. You can also run more than three JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JournalNodes, (three, five, seven, etc.) Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally. If the requisite quorum is not available, the NameNode will not format or start, and you will see an error similar to this:
12/10/01 17:34:18 WARN namenode.FSEditLog: Unable to determine input streams from QJM to [10.0.1.10:8485, 10.0.1.10:8486, 10.0.1.10:8487]. Skipping.
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
  Note:

In an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. If you are reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled, you can reuse the hardware which you had previously dedicated to the Secondary NameNode.

Hardware Configuration for Shared Storage Using NFS

 

If you are configuring HA with Quorum-based Storage, do not use this section; see Hardware Configuration for Quorum-based Storage instead.

In order to deploy an HA cluster with shared storage using NFS, you should prepare the following:
  • NameNode machines - the machines on which you run the Active and Standby NameNodes should have equivalent hardware to each other, and equivalent hardware to what would be used in a non-HA cluster.
  • Shared storage - you will need to have a shared directory which both NameNode machines can have read/write access to. Typically, this is a remote filer which supports NFS and is mounted on each of the NameNode machines. In this release, only one shared edits directory is supported. The availability of the system is limited by the availability of this shared edits directory, and therefore in order to remove all single points of failure there must be redundancy for the shared edits directory. That is, there must be multiple network paths to the storage, and redundancy in the storage itself (disk, network, and power). Because of this, it is recommended that the shared storage server be a high-quality dedicated NAS appliance rather than a simple Linux server. For more information, see Configuring a Remote NameNode Storage Directory.
  Note:

In an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. If you are reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled, you can reuse the hardware which you had previously dedicated to the Secondary NameNode.