This is the documentation for CDH 4.6.0.
Documentation for other versions is available at Cloudera Documentation.

Apache HBase

  Important: CDH4.x HBase Clients and HBase Servers are wire compatible. This means, you can safely upgrade servers without upgrading clients and safely upgrade clients without upgrading servers. For example,
  • If you have an HBase application written against CDH4.1.2 HBase, you can upgrade your HBase servers (RegionServer, Master) to CDH4.4.0 without upgrading the clients.

  • If you have an HBase application written against CDH4.4.0 HBase, you can rollback your HBase servers (RegionServer, Master) to CDH4.1.2 without rolling back the clients.

HBase Client Upgrade Incompatibilities

  Note: The following sections are only relevant for HBase Client upgrades. If you are upgrading only the HBase Servers and not the clients you do not need to make any modifications to your existing configuration or source code.
  Important: If you are upgrading from CDH3, upgrade both ZooKeeper and HDFS to preserve data compatibility.

Upgrading to CDH4.0 or CDH4.1

Upgrading from CDH3 or CDH4 Beta will require you to update the HBase client code and recompile it.

Upgrading to CDH4.2 and later

These upgrades introduce several new features to the HBase server but these have been turned off by default. None of these incompatibilities come into play unless you actually turn on these features. However, if you do turn them on, you will not be able to roll back to a previous version of CDH4.

  Note:

Programs using the HBase client libraries from before CDH4 Beta 1 must replace the HBase JAR file with the one from CDH4.2, in order to interoperate with CDH4.2. Additionally, the ZooKeeper data format has changed, so any client programs that interact directly with the HBase ZooKeeper information must be recompiled against CDH4.2 client libraries in order to interoperate with CDH4.1 HBase.

The two main incompatible changes introduced are listed below.
  • HBase Checksums

    HBase 0.94 checksums are backward-incompatible with 0.92, which was the version delivered in CDH4.1.x. HBase 0.94, delivered in CDH4.2, introduces a new Hfile format, V2.1. This format is incompatible with the format used in 0.92 in two ways: the data type for the version number is different, and checksums are stored in the internal data blocks. Neither of these incompatibilities comes into play until checksums are turned on, so CDH4.2 HBase turns checksums off by default. But if you turn checksums back on, you will not be able to roll back to CDH4.1.x because HBase 0.92 will not be able to read the Hfiles.

  • HBase Bloom Filters

    HBase CDH4.2 can produce bloom filters that are not backward compatible. HBase 0.94 created a new block type in an Hfile. This block type is a bloom filter for deletes; it is written into an Hfile whenever there are column-family deletes and bloom filters are turned on. HBase 0.92 does not have this block type and any attempt to have an HBase 0.92 (CDH4.1) Region Server read this file will result in the following error:

    java.io.IOException: Invalid HFile block magic: DFBLMET2
    	at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:124)
    	at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:135)
    	at org.apache.hadoop.hbase.io.hfile.HFileBlock.<init>(HFileBlock.java:167)
    	at org.apache.hadoop.hbase.io.hfile.HFileBlock.<init>(HFileBlock.java:76)
    	at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1395)
    	at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader$1.nextBlock(HFileBlock.java:986)
    	at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.<init>(HFileReaderV2.java:131)
    	at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:426)
    	at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:435)
    	at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1026)
    	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:485)
    	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:566)
    	at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:293)
    	at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:230)
    	at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2534)
    	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:454)
    	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3308)
    	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3256)
    	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:331)
    	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:107)
    	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    	at java.lang.Thread.run(Thread.java:680)

Other Incompatible Changes

  • Cloudera Manager 3.x uses older HBase client libraries and so is not compatible with CDH4.2 HBase. You should upgrade to Cloudera Manager 4.
  • When upgrading from CDH4.1.x to CDH 4.2.x (and later), the following methods are no longer code-compatible and will throw Exceptions:
    org.apache.hadoop.hbase.client.replication.ReplicationAdmin.enablePeer()
    org.apache.hadoop.hbase.client.replication.ReplicationAdmin.disablePeer()
  • The ZOOKEEPER_CONF environment variable is not automatically included in the HBase classpath. If your applications or scripts (such as CopyTable) depend upon automatically picking up settings from zoo.cfg, you must augment your hbase-site.xml file with your specific ZooKeeper settings.
  • HBASE-5228 removes the 'transform' functionality from the REST server.
  • HBASE-6553 removes the Avro Gateway.
  • Thrift JMX port is now set to 10103 by default. CDH4.2 incorporates HBASE-7277, and so, by default, when the Thrift server is started, it will enable JMX at port 10103 unless HBASE_THRIFT_JMX_OPTS is set differently. This means that you will need to configure a JMX access rule and password if HBASE_THRIFT_JMX_OPTS is not set. Otherwise, the Thrift server will not start. You can configure the JMX access rule and password by means of the environment variable HBASE_JMX_OPTS.
  • REST JMX port is now set to 10105 by default. CDH4.2 incorporates HBASE-7274, and so, by default, when the REST server is started, it will enable JMX at port 10105 unless HBASE_REST_JMX_OPTS is set differently. This means that you will need to configure a JMX access rule and password if HBASE_REST_JMX_OPTS is not set. Otherwise, the REST server will not start. You can configure the JMX access rule and password by means of the environment variable HBASE_JMX_OPTS.
  • BulkLoad Co-processor (CDH4.3 and later)

    As of CDH 4.3, there is a new secure BulkLoad co-processor. In a secure cluster, you must add the properties to hbase-site.xml as follows; BulkLoad jobs will no longer work with the previous configuration.
      <property>   
        <name>hbase.coprocessor.region.classes</name>   
        <value>
            org.apache.hadoop.hbase.security.token.TokenProvider,
            org.apache.hadoop.hbase.security.access.AccessController,
            org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint   
        </value>
      </property> 
      <property>
       <name>hbase.bulkload.staging.dir</name>   
       <value>/tmp/hbase-staging</value> 
      </property>
      
      Note: There should be no spaces or line-breaks after the comments in the value field for hbase.coprocessor.region.classes (the snippet above is formatted purely for readability; do not copy and paste it.)
    This change has the following ramifications for BulkLoad operations:
    • A CDH4.3 client cannot bulkload to a 4.2 server

    • A CDH4.2 client can bulkload to a CDH4.3 server

    • A CDH4.3 client can bulkload to a CDH4.3 server only after the configuration shown above has been done.