This is the documentation for Cloudera Manager 5.1.0.
Documentation for other versions is available at Cloudera Documentation.

The Impala Service

You can install Cloudera Impala through the Cloudera Manager installation wizard, using either parcels or packages, and have the service created and started as part of the first run installation wizard. See Installing Impala.

If you elect not to include the Impala service using the installation wizard, you can you the Add Service wizard to perform the installation. The wizard will automatically configure and start the dependent services and the Impala service. See Adding a Service for instructions.

For further information on the Impala service, see:

Continue reading:

Configuring the Impala Service

There are several types of configuration settings you may need to apply, depending on your situation.

Running Impala with CDH 4.1

If you are running CDH 4.1, and the Bypass Hive Metastore Server option is enabled, do the following:
  1. Go to the Impala service.
  2. Click the Configuration tab.
  3. Select Impala Daemon Default Group > Advanced.
  4. Add the following to the Impala Advanced Configuration Snippet for hive-site.xml property, replacing hive_metastore_server_host with the name of your Hive Metastore Server host:
    <property>
      <name>hive.metastore.local</name>
      <value>false</value> 
    </property> 
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://hive_metastore_server_host:9083</value> 
    </property>
    
  5. Click Save Changes.
  6. Restart the Impala service.

Enabling the Sentry Service for Impala

  Important: Ensure you have unchecked the Enable Sentry Authorization using Policy Files configuration property for both Hive and Impala under the Service-Wide > Policy File Based Sentry category.
To use the Sentry service:
  1. Enable the Sentry service for Hive. For details on how to do this, see Enabling the Sentry Service for Hive.
  2. Go to the Impala service.
  3. Click the Configuration tab.
  4. In the Service-Wide category, set the Sentry Service property to Sentry.
  5. Restart Impala.

Enabling Sentry Authorization using Policy Files for Impala

  1. Enable Sentry's policy file based authorization for Hive. For details on how to do this, see Enabling Sentry Authorization using Policy Files.
  2. Go to the Impala service.
  3. Click the Configuration tab.
  4. Under the Service-Wide category, go to the Policy File Based Sentry section.
  5. Check Enable Sentry Authorization Using Policy Files, then click Save Changes.
  6. Restart the Impala service.

Configuring Table Statistics

Configuring table statistics is highly recommended when using Impala. It allows Impala to make optimizations that can result in significant (over 10x) performance improvement for some joins. If these are not available, Impala will still function, but at lower performance.

The Impala implementation to compute table statistics is available in CDH 5.0.0 or higher and in Impala version 1.2.2 or higher. The Impala implementation of COMPUTE STATS requires no setup steps and is preferred over the Hive implementation. See Table Statistics. If you are running an older version of Impala, follow the procedure in Hive Table Statistics.

Impala Llama ApplicationMaster

The Impala Llama ApplicationMaster (Llama) role reserves and releases YARN-managed resources for Impala, thus reducing resource management overhead when performing Impala queries. For further information, see Managing Resources.

Adding the Llama Role

The Llama role is not created by default when you add an Impala service. To add the Llama role:
  1. Manually enable cgroup-based resource management:
    1. In the top navigation bar, click Hosts.
    2. Click the Configuration tab.
    3. Expand Resource Management.
    4. Check the Enable Cgroup-based Resource Management checkbox.
    5. Click Save Changes.
  2. Optionally configure one or more dynamic resource pools for YARN. If you do not configure pools, queries use the default pool or a pool named for the users who submit the queries.
  3. Configure YARN resource management properties:
    1. Go to the YARN service.
    2. Click the Configuration tab.
    3. Select Service-Wide > Resource Management.
    4. Check the Use CGroups for Resource Management and Always use Linux Container Executor properties.
    5. Click Save Changes.
    6. Select ResourceManager Default Group > Resource Management.
    7. Set the Container Memory Minimum and Container Virtual CPU Cores Minimum properties to 0.
    8. Click Save Changes.
    9. Select NodeManager Default Group > Resource Management.
    10. Record the value of the Container Memory property.
  4. Configure Impala resource management properties:
    1. Go to the Impala service.
    2. Click the Configuration tab.
    3. Click Resource Management.
    4. Select Service-Wide > YARN Service for Resource Management.
    5. Set it to the YARN service.
    6. Select Impala Daemon Default Group > Resource Management.
    7. Set Impala Daemon Memory Limit property to be equal to the value you recorded in 3j.
    8. Click Save Changes.
  5. Add and configure the Llama role:
    1. Click the Instances tab.
    2. Click the Add Role Instances button.
    3. Select a host in the column under Impala Llama ApplicationMaster, then click OK.
    4. Click Continue.
    5. Click the Configuration tab.
    6. Click Impala Llama ApplicationMaster Default Group.
    7. In the Core Queues property, enter the pools you created in step 2, if any.
    8. Click Save Changes.
  6. Restart services and redeploy client configurations:
    1. Click in the top right.
    2. Click Restart Cluster.
    3. Click Restart Now.
    4. Click Finish.

Configuring Llama for High Availability

Llama High Availability (HA) uses an Active/Standby architecture, in which the active Llama is automatically elected using the ZooKeeper-based ActiveStandbyElector. The active Llama accepts RPC/Thrift connections and communicates with YARN. The standby Llama monitors the leader information in ZooKeeper, but doesn't accept RPC/Thrift connections.

Only one of the Llamas should be active to ensure the resources are not partitioned. Llama uses ZooKeeper Access Control Lists (ACLs) to claim exclusive ownership of the cluster when transitioning to active, and monitors this ownership periodically. If another Llama takes over, the first one realizes it within this period.

To claim resources from YARN, Llama spawns YARN applications and runs unmanaged ApplicationMasters. When a Llama goes down, the resources allocated to all the YARN applications spawned by it are not reclaimed until YARN times out those applications (default timeout is 10 minutes). On Llama failure, these resources are reclaimed by means of a Llama that kills any YARN applications spawned by this pair of Llamas.

To configure Llama for High Availability:
  1. Go to the Impala service.
  2. Add a Llama role instance.
  3. Click the Configuration tab.
  4. Expand the Impala Llama ApplicationMaster Default Group > Advanced category.
  5. In the Impala Llama ApplicationMaster Advanced Configuration Snippet (Safety Valve) for llama-site.xml property, configure the following properties:
    Property Description Default Recommended
    llama.am.cluster.id Cluster ID of the Llama pair, used to differentiate between different Llamas llama [cluster-specific]
    llama.am.ha.enabled* Whether to enable Llama HA false true
    llama.am.ha.zk-quorum* ZooKeeper quorum to use for leader election and fencing [cluster-specific]
    llama.am.ha.zk-base Base znode for leader election and fencing data /llama [cluster-specific]
    llama.am.ha.zk-timeout-ms The session timeout, in milliseconds, for connections to ZooKeeper quorum 10000 10000
    llama.am.ha.zk-acl ACLs to control access to ZooKeeper world:anyone:rwcda [cluster-specific]
    llama.am.ha.zk-auth Authorization information to go with the ACLs [cluster-acl-specific]

    *Required configurations

    You must enter property values in XML format. For example:
    <property>
      <name>llama.am.cluster.id</name>
      <value>llama</value>
    </property>
  6. Expand the Impala Daemon Default Group > Advanced category.
  7. Specify command-line flags as one key-value pair per line in the Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve) property. The supported flags are:
    • -llama_addresses: Comma-separated list of hostname:port items, specifying all the members of the Llama availability group. Defaults to "127.0.0.1:15000".
    • -llama_max_request_attempts: Maximum number of times a request to reserve, expand, or release resources is retried until the request is cancelled. Attempts are only counted after Impala is registered with Llama. That is, a request survives at mostllama_max_request_attempts-1 re-registrations. Defaults to 5.
    • -llama_registration_timeout_secs: Maximum number of seconds that Impala will attempt to register or re-register with Llama. If registration is unsuccessful, Impala cancels the action with an error, which could result in an impalad startup failure or a cancelled query. A setting of -1 means try indefinitely. Defaults to 30.
    • -llama_registration_wait_secs: Number of seconds to wait between attempts during Llama registration. Defaults to 3.
    For example:
    -llama_addresses=host1:15000,host2:15000
    -llama_max_request_attempts=10
  8. Click Save Changes.
  9. Restart services and redeploy client configurations:
    1. Click in the top right.
    2. Click Restart Cluster.
    3. Click Restart Now.
    4. Click Finish.

Impala Web Servers

Enabling and Disabling Access to Impala Web Servers

By default access to the Impala Daemon and StateStore web servers is enabled.
  • Impala StateStore
    1. Go to the Impala service.
    2. Click the Configuration tab.
    3. Select Impala StateStore Default Group.
    4. Check or uncheck Enable StateStore Web Server.
    5. Click Save Changes.
    6. Restart the Impala service.
  • Impala Daemon
    1. Go to the Impala service.
    2. Click the Configuration tab.
    3. Select Impala Daemon Default Group > Ports and Addresses.
    4. Check or uncheck Enable Impala Daemon Web Server.
    5. Click Save Changes.
    6. Restart the Impala service.

Opening Impala Web Server UIs

  • Impala StateStore
    1. Go to the Impala service.
    2. Select Web UI > Impala StateStore Web UI.
  • Impala Daemon
    1. Go the to Impala service.
    2. Click the Instances tab.
    3. Click an Impala Daemon instance.
    4. Click Impala Daemon Web UI.
  • Impala Catalog Server
    1. Go to the Impala service.
    2. Select Web UI > Impala Catalog Web UI.
  • Impala Llama ApplicationMaster
    1. Go to the Impala service.
    2. Click the Instances tab.
    3. Click a Impala Llama ApplicationMaster instance.
    4. Click Llama Web UI.

Configuring Secure Access for Impala Web Servers

Cloudera Manager supports two methods of authentication for secure access to the Impala Catalog Server, Daemon, and StateStore web servers: password-based authentication and SSL certificate authentication. Both of these can be configured through properties of the Impala Catalog Server, Daemon, and StateStore. Authentication for the three types of daemons can be configured independently.

Configuring Password Authentication

  1. Go to the Impala service.
  2. Click the Configuration tab.
  3. Search for "password" using the Search box within the Configuration page. This should display the password-related properties (Username and Password properties) for the Impala Catalog Server, Daemon, and StateStore. If there are multiple role groups configured for Impala Daemon instances, the search should display all of them.
  4. Enter a username and password into these fields.
  5. Click Save Changes.
  6. Restart the Impala service.

Now when you access the Web UI for the Impala Catalog Server, Daemon, and StateStore, you are asked to log in before access is granted.

Configuring SSL Certificate Authentication

  1. Create or obtain an SSL certificate.
  2. Place the certificate, in .pem format, on the hosts where the Impala Catalog Server and StateStore are running, and on each host where an Impala Daemon is running. It can be placed in any location (path) you choose. If all the Impala Daemons are members of the same role group, then the .pem file must have the same path on every host.
  3. Go to the Impala service page.
  4. Click the Configuration tab.
  5. Search for "certificate" using the Search box within the Configuration page. This should display the certificate file location properties for the Impala Catalog Server, Daemon, and StateStore. If there are multiple role groups configured for Impala Daemon instances, the search should display all of them.
  6. In the property fields, enter the full path name to the certificate file.
  7. Click Save Changes.
  8. Restart the Impala service.

When you access the Web UI for the Impala Catalog Server, Daemon, and StateStore, https will be used.