Installing Impala with Cloudera Manager Free Edition
To use Cloudera Impala, you must install CDH and Impala (Hive is required, and gets installed with CDH). Install CDH and Impala on the nodes that will run Impala.
Follow the instruction in Automated Installation of Cloudera Manager and CDH.
The installation process installs Cloudera Manager, CDH, and gives you the option to install Impala as part of the process of using the pre-packaged installer wizard. Assuming you elect to install Cloudera Impala, this method installs all the necessary software, handles setting up the Hive metastore using the default PostgreSQL database, and will start the Impala Service along with the other CDH and Cloudera Manager services. Within the installation wizard you can install Impala using either packages or parcels.
Once you have installed Impala, you can coordinate its use of cluster resources in relation to MapReduce needs for the same resources. See Setting up a Multi-tenant Cluster for Impala and MapReduce below, as well as Resource Management in the Cloudera Manager User Guide.
<property> <name> hive.metastore.local</name> <value>false</value> </property> <property> <name> hive.metastore.uris</name> <value>thrift://<hive_metastore_server_host>:9083</value> </property>Otherwise, Impala queries will fail.
Configuring Hive Table Stats
Configuring Hive Table Stats is highly recommended when using Impala. It allows Impala to make optimizations that can result in significant (over 10x) performance improvements for some joins. If these are not available, Impala will still function, but at lower performance.
To configure Hive Table Stats:
Set up a MySQL server for transient Stats data.
Note that there is no PostgreSQL or Oracle option. This database will be heavily loaded, so it should not be installed on the same host as anything critical such as the Hive Metastore Server, the database hosting the Hive Metastore, or Cloudera Manager Server. When collecting stats on a large table and/or in a large cluster, this host may become slow or unresponsive.
For instructions on setting up MySQL, see "Installing and Configuring a MySQL Database" in the Cloudera Manager Enterprise Edition Installation Guide.
Add the following into the HiveServer2 Safety Valve for hive-site.xml:
<property> <name>hive.stats.dbclass</name> <value>jdbc:mysql</value> </property> <property> <name>hive.stats.jdbcdriver</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>hive.stats.dbconnectionstring</name> <value>jdbc:mysql://<stats_mysql_host>:3306/<stats_db_name>?useUnicode=true& characterEncoding=UTF-8&user=<stats_user>&password=<stats_password></value> </property> <property> <name>hive.aux.jars.path</name> <value>file:///usr/share/java/mysql-connector-java.jar</value> </property>
Collect stats on a particular table by running the following from a beeline client connected to your HiveServer2:
analyze table MY_TABLE compute statistics;