Introduction to Hadoop Security
The security features in CDH 5 enable Hadoop to prevent malicious user impersonation. The Hadoop daemons leverage Kerberos to perform user authentication on all remote procedure calls (RPCs). Group resolution is performed on the Hadoop master nodes, NameNode, JobTracker and ResourceManager to guarantee that group membership cannot be manipulated by users. Map tasks are run under the user account of the user who submitted the job, ensuring isolation there. In addition to these features, new authorization mechanisms have been introduced to HDFS and MapReduce to enable more control over user access to data.
The security features in CDH 5 meet the needs of most Hadoop customers because typically the cluster is accessible only to trusted personnel. In particular, Hadoop's current threat model assumes that users cannot:
- Have root access to cluster machines.
- Have root access to shared client machines.
- Read or modify packets on the network of the cluster.
CDH 5 supports encryption of all user data sent over the network. For configuration instructions, see Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport.
Note also that there is no built-in support for on-disk encryption.