This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Apache HBase

— HBase may be a little slower

After upgrading to CDH 4.2.0, you may experience some performance impact, typically around 5%, depending on workload, what you are measuring, and other variables. This is due to some per-column-family metrics that were introduced in HBase 0.94.

Bug: HBASE-7868

Severity: High

Workaround: Enable HDFS short-circuit reads and FAST_DIFF data block encoding from HBASE-4218 to regain lost performance. Data block encoding is new in CDH 4.2.0. With these two options enabled, most workloads will see the same performance or a net improvement compared to earlier CDH versions.

To create a new table with FAST_DIFF encoding Specify it as a FAMILY option:
 hbase> create 'tableName', {NAME => 'familyName', DATA_BLOCK_ENCODING => 'FAST_DIFF'} 
To enable FAST_DIFF on an existing table Disable the table and alter the family's block encoding:
 hbase> disable 'tableName'
hbase> alter 'tableName', {NAME => 'familyName', DATA_BLOCK_ENCODING => 'FAST_DIFF'}
hbase> enable 'tableName' 

Note that if you decide to downgrade to CDH4.1 after enabling FAST_DIFF, you must first either change DATA_BLOCK_ENCODING back to NONE and trigger major compactions, or export your data via Export Table/Import Table or Copy Table. In any case you will have to rewrite all of your data.

To trigger major compactions:
 hbase> major_compact 'tableName'

— Node crash can result in data loss and long lease-recovery in CDH4.4.x

Bug: HBASE-8670

Severity: High

Resolution: None; use workaround

Workaround: In CDH4.4.x clusters, modify the configuration so that lease recovery is retried at an interval that is less than the HDFS RPC timeout value. For example, set the following values:
  • On the Region Server, in hbase-site.xml:
      <property>
        <name>dfs.client.read.shortcircuit.buffer.size</name>
        <value>131072</value>
        <description>Needed if SSR enabled.</description>
      </property>
      <property> 
        <name>hbase.lease.recovery.dfs.timeout</name>
        <value>23000</value>
        <description>How much time we allow elapse between calls to recover lease.  Should be larger than the dfs
        timeout.</description>
      </property>
      <property>
        <name>dfs.client.socket-timeout</name>
        <value>10000</value>
        <description>Reduce the DFS timeout from 60 to 10 seconds.</description>
      </property> 
  • On the NameNode and DataNodes, in hdfs-site.xml
      <property>
        <name>dfs.client.socket-timeout</name>
        <value>10000</value> 
        <description>Reduce the DFS timeout from 60 to 10 seconds.</description>
      </property> 
      <property>
        <name>dfs.datanode.socket.write.timeout</name>
        <value>10000</value> 
        <description>Reduce the DFS timeout from 8 * 60 to 10 seconds.</description>
      </property> 
      <property>
        <name>ipc.client.connect.timeout</name>
        <value>3000</value> 
        <description>Reduce from 60 seconds to 3.</description>
      </property>
      <property> 
        <name>ipc.client.connect.max.retries.on.timeouts</name>
        <value>2</value>
        <description>Reduce from 45 seconds to 3 (2 == 3 retries).</description>
      </property>
      <property> 
        <name>dfs.namenode.avoid.read.stale.datanode</name>
        <value>true</value>
        <description>Enable stale state in hdfs</description>
      </property>
      <property> 
        <name>dfs.namenode.stale.datanode.interval</name>
        <value>20000</value>
        <description>Reduce from default 30 seconds</description>
      </property>
      <property> 
        <name>dfs.namenode.avoid.write.stale.datanode</name>
        <value>true</value>
        <description>Enable stale state in hdfs</description>
      </property>                 

If using ACLs, you must explicitly add permissions for owner users before upgrading from 4.1.x

In CDH4.1.x, an HBase table could have an owner. The owner user had full administrative permissions on the table (RWXCA). These permissions were implicit (that is, they were not stored explicitly in the HBase acl table), but the code checked them when determining if a user could perform an operation.

The owner construct was removed as of CDH4.2.0, and the code now relies exclusively on entries in the acl table. Since table owners do not have an entry in this table, their permissions are removed on upgrade from CDH4.1.x to CDH4.2.0 or later.

Bug: None

Severity: Medium

Anticipated Resolution: None; use workaround

Workaround: If you are using ACLs, add permissions for owner users before upgrading from CDH4.1.x. (You can run scan 'acl' from the HBase shell if you are not sure if ACLs are being used or not). You can automate the task of making the owner users' implicit permissions explicit by putting code similar to the following in a file and running it as a script from the HBase shell. (See the HBase Shell page for information on using the HBase shell.)
PERMISSIONS = 'RWXCA'

tables.each do |t|
  table_name = t.getNameAsString
  owner = t.getOwnerString
  LOG.warn( "Granting " + owner +  " with
        " + PERMISSIONS +  " for 
        table " + table_name)  
  user_permission = UserPermission. new(owner.to_java_bytes, table_name.to_java_bytes, 
                                       nil, nil, PERMISSIONS.to_java_bytes)
  protocol.grant(user_permission)
end

— Must upgrade the Master before upgrading the RegionServers

If you upgrade a RegionServer from a CDH4 release earlier than CDH4.2.0 and attempt to offload the data from that RegionServer, the operation will fail unless the Master is already running CDH4.2 or later.

Bug: HBASE-6927

Severity: High

Workaround: Upgrade the Master first and run the unload from the Master, instead of the RegionServer.

— Rolling restart will fail if HLog compression is enabled and some nodes are not running CDH4.2 or later

Bug: None

Severity: Medium

Resolution: None; use workaround

Workaround: Enable HLog compression only on a cluster in which all the nodes are running CDH4.2 or later.

— Change in default splitting policy from ConstantSizeRegionSplitPolicy to IncreasingToUpperBoundRegionSplitPolicy may create too many splits

Split size is the number of regions that are on this server that all are part of the same table, squared, times the region flush size or the maximum region split size, whichever is smaller. For example, if the flush size is 128MB, then on first flush we will split, making two regions that will split when their size is 2 * 2 * 128MB = 512MB. If one of these regions splits, there are three regions and now the split size is 3 * 3 * 128MB = 1152MB, and so on until we reach the configured maximum file size, and then from then, we'll use that.

This new default policy could create many splits if you have many tables in your cluster.

This default split size has also changed - from 64MB to 128MB; and the region eventual split size, hbase.hregion.max.filesize, is now 10GB (it was 1GB).

Bug: None

Severity: Medium

Resolution: None; use workaround

Workaround: If find you are getting too many splits, either go back to to the old split policy or increase the hbase.hregion.memstore.flush.size.

— Column family manipulations are binary-incompatible between CDH4.2 and CDH4.0/CDH4.1

Because of HBASE-5357, code compiled against CDH4.0 and CDH4.1 will fail with java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V, if used with the CDH4.2 libraries. The reason is that the setter methods in HColumnDescriptor were modified to return HColumnDescriptor instead of void, which changes their signature. Code that only does data manipulations, using the HTable class, will still work without recompilation.

Bug: HBASE-8273

Severity: Medium

Resolution: None planned; use workaround.

Workaround: Code compiled against CDH4.0 and 4.1 that uses HColumnDescriptor must be recompiled against CDH4.2 in order to work with the CDH4.2 libraries. Code compiled against CDH4.0 and CDH4.1 running with those (4.0 and 4.1) libraries does not have this problem.

— In a non-secure cluster, MapReduce over HBase does not properly handle splits in the BulkLoad case

You may see errors because of:

  • missing permissions on the directory that contains the files to bulk load
  • missing ACL rights for the table/families

Bug: None

Severity: Medium

Resolution: None; use workaround.

Workaround: In a non-secure cluster, execute BulkLoad as the hbase user.
  Note: For important information about configuration that is required for BulkLoad in a secure cluster as of CDH4.3, see Incompatible Changes.

— During an upgrade from CDH3 to CDH4, regions in transition may cause HBase startup failures.

Bug: None

Severity: Medium

Workaround: Delete the /hbase ZNode in ZooKeeper before starting up CDH4.

— Pluggable compaction and scan policies via coprocessors (HBASE-6427) not supported

Cloudera does not provide support for user-provided custom coprocessors.

Bug: HBASE-6427

Severity: Low

Resolution: None planned

Workaround: None

— Custom constraints coprocessors (HBASE-4605) not supported

The constraints coprocessor feature provides a framework for constrains and requires you to add your own custom code. Cloudera does not support user-provided custom code, and hence does not support this feature.

Bug: HBASE-4605

Severity: Low

Resolution: None planned

Workaround: None

— Pluggable split key policy (HBASE-5304) not supported

Cloudera supports the two split policies that are supplied and tested: ConstantSizeSplitPolicy and PrefixSplitKeyPolicy. The code also provides a mechanism for custom policies that are specified by adding a class name to the HTableDescriptor. Custom code added via this mechanism must be provided by the user. Cloudera does not support user-provided custom code, and hence does not support this feature.

Bug: HBASE-5304

Severity: Low

Resolution: None planned

Workaround: None

— During an upgrade from CDH3 to CDH4, problems opening HBase internal files are possible

During an upgrade from CDH3 to CDH4, some users have encountered problems opening HBase internal files because of a bad version warning. They receive a warning similar to the following:
2012-07-18 05:55:01,152 ERROR handler.OpenRegionHandler (OpenRegionHandler.java:openRegion(346)) -
Failed open of region=user_mappings,080112102AA76EF98197605D341B9E6C5824D2BC|1001,
1317824890618.eaed0e7abc6d27d28ff0e5a9b49c4c0d.
java.io.IOException: java.lang.IllegalArgumentException: Invalid HFile version: 842220600 (expected to be between 1 and 2) 

Bug: None

Severity: Medium

Workaround: Use the new hbck options to locate and sideline the corrupted HBase hfiles:

Check and find all corrupted files:

hbase hbck -checkCorruptHFiles <table>

Sideline (move) corrupted files:

hbase hbck -sidelineCorruptHFiles <table>

— Unable to stop a master that is waiting on -ROOT- during initialization.

The HBase master server may hang on shutdown if it hasn't assigned -ROOT- yet, with the following log message:
"master-sv4r20s12,10302,1331916142866" prio=10 tid=0x00007f3708008800 nid=0x4b20 in Object.wait() [0x00007f370d1d0000]
 java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000006030be3f8> (a org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
	at java.lang.Object.wait(Object.java:485)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
	- locked <0x00000006030be3f8> (a org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
	- locked <0x00000006030be3f8> (a org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:313)
	at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:571)
	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:336)
	at java.lang.Thread.run(Thread.java:662) 

Bug: HBASE-5594

Severity: Low

Workaround: You can kill -9 the HBase master process safely if this happens.

— CDH4 Beta 2 (and later) clients cannot communicate with servers older than CDH4 Beta 2.

Although this behavior was never guaranteed, it actually worked prior to these changes.

Bug: HBASE-5209

Severity: Low

Resolution: None

Workaround: None

— HBase may not tolerate HDFS root directory changes.

While HBase is running, do not stop the HDFS instance running under it and restart it again with a different root directory for HBase.

Bug: None

Severity: Medium

Resolution: None

Workaround: None

— AccessController postOperation problems in asynchronous operations

When security and AccessControl are enabled, the following problems occur:

  • If a Delete Table fails for a reason other than missing permissions, the access rights are removed but the table may still exist and may be used again.
  • If hbaseAdmin.modifyTable() is used to delete column families, the rights are not removed from the Access Control List (ACL) table. The postOperation is implemented only for postDeleteColumn().
  • If Create Table fails, full rights for that table persist for the user who attempted to create it. If another user later succeeds in creating the table, the user who made the failed attempt still has the full rights.

Bug: HBASE-6992

Severity: Medium

Workaround: None

— Native library not included in tarballs

The native library that enables Region Server page pinning on Linux is not included in tarballs. This could impair performance if you install HBase from tarballs.

Bug: None

Severity: Low

Resolution: None planned

Workaround: None