This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.


  1. The CDH4 cluster must have a MapReduce service running on it. This may be MRv1 or YARN (MRv2).
  2. All the MapReduce nodes in the CDH4 cluster should have full network access to all the nodes of the source cluster. This allows you to perform the copy in a distributed manner.

The term source refers to the CDH3 (or other Hadoop) cluster you want to migrate or copy data from; and destination refers to the CDH4 cluster.