This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Configuring Flume's Security Properties

Writing as a single user for all HDFS sinks in a given Flume agent

The Hadoop services require a three-part principal that has the form of username/fully.qualified.domain.name@YOUR-REALM.COM. Cloudera recommends using flume as the first component and the fully qualified domain name of the host machine as the second. Assuming that Kerberos and security-enabled Hadoop have been properly configured on the Hadoop cluster itself, you must add the following parameters to the Flume agent's flume.conf configuration file, which is typically located at /etc/flume-ng/conf/flume.conf:

agentName.sinks.sinkName.hdfs.kerberosPrincipal = flume/fully.qualified.domain.name@YOUR-REALM.COM
agentName.sinks.sinkName.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab

where:

agentName is the name of the Flume agent being configured, which in this release defaults to the value "agent". sinkName is the name of the HDFS sink that is being configured. The respective sink's type must be HDFS.

In the previous example, flume is the first component of the principal name, fully.qualified.domain.name is the second, and YOUR-REALM.COM is the name of the Kerberos realm your Hadoop cluster is in. The /etc/flume-ng/conf/flume.keytab file contains the keys necessary for flume/fully.qualified.domain.name@YOUR-REALM.COM to authenticate with other services.

Flume and Hadoop also provide a simple keyword, _HOST, that gets expanded to be the fully qualified domain name of the host machine where the service is running. This allows you to have one flume.conf file with the same hdfs.kerberosPrincipal value on all of your agent host machines.

agentName.sinks.sinkName.hdfs.kerberosPrincipal = flume/_HOST@YOUR-REALM.COM

Writing as different users across multiple HDFS sinks in a single Flume agent

In this release, support has been added for secure impersonation of Hadoop users (similar to "sudo" in UNIX). This is implemented in a way similar to how Oozie implements secure user impersonation.

The following steps to set up secure impersonation from Flume to HDFS assume your cluster is configured using Kerberos. (However, impersonation also works on non-Kerberos secured clusters, and Kerberos-specific aspects should be omitted in that case.)

  1. Configure Hadoop to allow impersonation. Add the following configuration properties to your core-site.xml.
    <property>
         <name>hadoop.proxyuser.flume.groups</name>
         <value>group1,group2</value>
         <description>Allow the flume user to impersonate any members of group1 and group2</description>
    </property>
    <property>
         <name>hadoop.proxyuser.flume.hosts</name>
         <value>host1,host2</value>
         <description>Allow the flume user to connect only from host1 and host2 to impersonate a user</description>
    </property>
    You can use the wildcard character * to enable impersonation of any user from any host. For more information, see Secure Impersonation.
  2. Set up a Kerberos keytab for the Kerberos principal and host Flume is connecting to HDFS from. This user must match the Hadoop configuration in the preceding step. For instructions, see Configuring Hadoop Security in CDH4.
  3. Configure the HDFS sink with the following configuration options:
  4. hdfs.kerberosPrincipal - fully-qualified principal. Note: _HOST will be replaced by the hostname of the local machine (only in-between the / and @ characters)
  5. hdfs.kerberosKeytab - location on the local machine of the keytab containing the user and host keys for the above principal
  6. hdfs.proxyUser - the proxy user to impersonate

Example snippet (the majority of the HDFS sink configuration options have been omitted):

agent.sinks.sink-1.type = HDFS
agent.sinks.sink-1.hdfs.kerberosPrincipal = flume/_HOST@YOUR-REALM.COM
agent.sinks.sink-1.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab
agent.sinks.sink-1.hdfs.proxyUser = weblogs

agent.sinks.sink-2.type = HDFS
agent.sinks.sink-2.hdfs.kerberosPrincipal = flume/_HOST@YOUR-REALM.COM
agent.sinks.sink-2.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab
agent.sinks.sink-2.hdfs.proxyUser = applogs

In the above example, the flume Kerberos principal impersonates the user weblogs in sink-1 and the user applogs in sink-2. This will only be allowed if the Kerberos KDC authenticates the specified principal (flume in this case), and the if NameNode authorizes impersonation of the specified proxy user by the specified principal.

Limitations

At this time, Flume does not support using multiple Kerberos principals or keytabs in the same agent. Therefore, if you want to create files as multiple users on HDFS, then impersonation must be configured, and exactly one principal must be configured in Hadoop to allow impersonation of all desired accounts. In addition, the same keytab path must be used across all HDFS sinks in the same agent. If you attempt to configure multiple principals or keytabs in the same agent, Flume will emit the following error message:

Cannot use multiple kerberos principals in the same agent. Must restart agent to use new principal or keytab.