This is the documentation for CDH 4.7.0.
Documentation for other versions is available at Cloudera Documentation.

Apache Hive

  Note: This version of Hive includes a preview of an advisory authorization feature which is not yet complete and cannot be deployed at present.

— PostgreSQL 9.0+ requires additional configuration

The Hive metastore will not start if you use a version of PostgreSQL later than 9.0 in the default configuration. You will see output similar to this in the log:
Caused by: javax.jdo.JDODataStoreException: Error executing JDOQL query "SELECT "THIS"."TBL_NAME" AS NUCORDER0 FROM "TBLS" "THIS" LEFT OUTER JOIN "DBS" "THIS_DATABASE_NAME" ON "THIS"."DB_ID" = "THIS_DATABASE_NAME"."DB_ID" WHERE "THIS_DATABASE_NAME"."NAME" = ? AND (LOWER("THIS"."TBL_NAME") LIKE ? ESCAPE '\\' ) ORDER BY NUCORDER0 " : ERROR: invalid escape string
 Hint: Escape string must be empty or one character..
NestedThrowables:
org.postgresql.util.PSQLException: ERROR: invalid escape string
 Hint: Escape string must be empty or one character.
 at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
 at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:252)
 at org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:759)
 ... 28 more
Caused by: org.postgresql.util.PSQLException: ERROR: invalid escape string
 Hint: Escape string must be empty or one character.
 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096)
 at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829)
 at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386)
 at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:271)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
 at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
 at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:457)
 at org.datanucleus.store.rdbms.query.legacy.SQLEvaluator.evaluate(SQLEvaluator.java:123)
 at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.performExecute(JDOQLQuery.java:288)
 at org.datanucleus.store.query.Query.executeQuery(Query.java:1657)
 at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
 at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
 at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
 ... 29 more 

The problem is caused by a backward-incompatible change in the default value of the standard_conforming_strings property. Versions up to PostgreSQL 9.0 defaulted to off, but starting with version 9.0 the default is on.

Bug: None Severity: Low

Resolution: None; use workaround.

Workaround: As the administrator user, use the following command to turn standard_conforming_strings off:
ALTER DATABASE <hive_db_name> SET standard_conforming_strings = off; 

— Problems affecting access to Hive metastore after upgrade from CDH3

After upgrading from CDH3 to CDH4.2 or later and upgrading the metastore to the Hive 0.10, you may see metastore errors. Specifically, if the metastore.SDS table's location column contains EXTERNAL Hive table definitions and there are no corresponding port specification in the HDFS URI, LOAD DATA INPATH will produce an error such as the following:
FAILED: SemanticException [Error 10028]: Line  1:18 Path is not legal <path> Please check that values for params  "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.
The cause is that Hive 0.10 expects the port number to be in the metastore.SDS.location column entries.

Bug: None.

Severity: Medium

Workaround: Use the following MySQL command to update every row in the SDS table and to add the port number to the HDFS URI:
UPDATE metastore.SDS SET location = replace(location, '$HOSTNAME/', '$HOSTNAME:8020/')

— Beeline shell may hang or report an out-of-memory exception when running a query that generates large results

Beeline has two modes of retrieving data, a buffered mode and an incremental mode. In buffered mode (the default) it fetches the entire result set and buffers it in memory before displaying it. If the result set is too large to fit the current JVM heap, the operation fails. In incremental mode, result rows are displayed as soon as they are fetched; this provides lower latency and memory usage at the price of extra display column padding.

Bug: None.

Severity: Low

Resolution: None; this is the inherent behavior of buffered mode. Use workaround.

Workaround: Switch to incremental mode in Beeline as follows: !set incremental true.

— Queries spawned from MapReduce jobs in MRv1 fail if mapreduce.framework.name is set to yarn

Queries spawned from MapReduce jobs fail in MRv1 with a null pointer exception (NPE) if /etc/mapred/conf/mapred-site.xml has the the following:
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property> 

Bug: None

Severity: High

Resolution: Use workaround

Workaround: Remove the mapreduce.framework.name property from mapred-site.xml.

— Hive, Pig, and Sqoop fail in MRv1 tarball installation because /usr/bin/hbase sets HADOOP_MAPRED_HOME to MR2

This problem affects tarball installations only.

Bug: None

Severity: High

Resolution: Use workaround

Workaround: If you are using MRv1, edit the following line in /etc/default/hadoop from
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce 
to
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce 

In addition, /usr/lib/hadoop-mapreduce must not exist in HADOOP_CLASSPATH.

— Commands run against an Oracle backed Metastore may fail

Commands run against an Oracle backed Metastore fail with error:
{{javax.jdo.JDODataStoreException Incompatible data type for column TBLS.VIEW_EXPANDED_TEXT : was CLOB (datastore), but type expected was LONGVARCHAR (metadata). Please check that the type in the datastore and the type specified in the MetaData are consistent.) 

This error may occur if the metastore is run on top of an Oracle database with the configuration property datanucleus.validateColumns set to true.

Bug: None

Severity: Low

Resolution: None; use workaround.

Workaround: Set datanucleus.validateColumns=false in the hive-site.xml configuration file.

— Hive Web Interface not supported

Cloudera no longer supports the Hive Web Interface because of inconsistent upstream maintenance of this project.

Bug: DISTRO-77

Severity: Low

Resolution: None; use workaround.

Workaround: Use Hue and Beeswax instead of the Hive Web Interface.

— Hive may need additional configuration to make it work in an Federated HDFS cluster

Hive jobs normally move data from a temporary directory to a warehouse directory during execution. Hive uses /tmp as its temporary directory by default, and users usually configure /user/hive/warehouse/ as the warehouse directory. Under Federated HDFS, /tmp and /user are configured as ViewFS mount tables, and so the Hive job will actually try to move data between two ViewFS mount tables. Federated HDFS does not support this, and the job will fail with the following error:
Failed with exception Renames across Mount points not supported 

Bug: None

Severity: Low

Resolution: No software fix planned; use the workaround.

Workaround: Modify /etc/hive/conf/hive-site.xml to allow the temporary directory and warehouse directory to use the same ViewFS mount table. For example, if the warehouse directory is /user/hive/warehouse, add the following property to /etc/hive/conf/hive-site.xml so both directories use the ViewFS mount table for /user.
<property>
 <name>hive.exec.scratchdir</name>
 <value>/user/${user.name}/tmp</value>
</property> 

— Inserting data into an Avro-backed Hive table may fail on a Federated HDFS cluster

Inserting data into an Avro-backed Hive table fails on a federated HDFS cluster if avro.schema.url is a ViewFS or HDFS URL.

Bug: None

Severity: Low

Resolution: No software fix planned; use the workaround.

Workaround: Use an HTTP URL.

— Cannot create archive partitions with external HAR (HTTP Archive) tables

ALTER TABLE ... ARCHIVE PARTITION is not supported on external tables.

Bug: None

Severity: Low

Resolution: None planned

Workaround: None

— Setting hive.optimize.skewjoin to true causes long running queries to fail

Bug: HIVE-5000

Severity: Low

Workaround: None

— JDBC - executeUpdate does not returns the number of rows modified

Contrary to the documentation, method executeUpdate always returns zero.

Bug: None

Severity: Low

Workaround: None

— Files in Avro tables no longer have .avro extension

As of CDH 4.3.0 Hive no longer creates files in Avro tables with the .avro extension by default. This does not cause any problems in Hive, but could affect downstream components such as Pig, MapReduce, or Sqoop that expect files with the .avro extension.

Bug: None

Severity: Low

Workaround: Manually set the extension to .avro before using a query that inserts data into your Avro table. Use the following set statement:

set hive.output.file.extension=".avro";

— Table statistics are not configured correctly out of the box

Bug: None

Severity: Medium

Workaround: If you are using Hive with Impala, configure table statistics manually as explained in the Impala documentation. Otherwise, disable collection of table statistics by modifying hive-site.xml on all possible clients as follows:
<property>
  <name>hive.stats.autogather</name>
  <value>false</value>
</property

decimal type is not accepted by built-in UDF

The decimal data type is not supported as an argument for built-in UDFs such as avg()

Bug: HIVE-4128

Severity: Medium

Workaround: None

decimal type is rendered in scientific notation

The decimal data type is rendered in scientific notation rather than as a decimal number.

Bug: None

Severity: Medium

Workaround: None

— Querying Avro tables with empty partition folders raises exceptions

Bug: None

Severity: High

Workaround: None

— Hive cannot access a timestamp field in a Parquet table

Bug: None

Severity: Medium

Workaround:Use type int or string for the field.

— Hive creates an invalid table if you specify more than one partition with alter table

Hive (in all known versions from 0.7) allows you to configure multiple partitions with a single alter table command, but the configuration it creates is invalid for both Hive and Impala.

Bug: None

Severity: Medium

Workaround:

Correct results can be obtained by configuring each partition with its own alter table command in either Hive or Impala .For example, the following:
ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808' PARTITION 
(dt='2008-08-09', country='us') location '/path/to/us/part080809';
should be replaced with:
ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808'; 
ALTER TABLE page_view ADD PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';