This is the documentation for Cloudera Search CDH 5 Beta 2 and 1.2.0 for CDH 4.
Documentation for other versions is available at Cloudera Documentation.

New Features in Cloudera Search Version 1.1.0

  • HBase Batch Indexing:
    • Cloudera Search supports batch indexing of HBase tables using MapReduce jobs. Such batch indexing does not use the HBase replication feature, the Lily HBase Indexer Service, nor does it require registering a Lily HBase Indexer configuration with the Lily HBase Indexer Service. For more information, see Using the Lily HBase Batch Indexer for Indexing.
    • Search supports emitting zero or more Solr documents for each HBase input row or HBase input cell. Previously, exactly one Solr document had to be emitted.
    • Search supports using dynamic output fields when indexing HBase using the extractHBaseCells morphline command. outputField parameters ending with a * wildcard enable dynamic output fields.

      For example:

      inputColumn : "m:e:*"
      outputField : "belongs_to_*"
      For these puts in HBase:
      put 'table_name' , 'row1' , 'm:e:1' , 'foo'
      put 'table_name' , 'row1' , 'm:e:9' , 'bar'
      The fields of the Solr document are as follows:
      belongs_to_1 : foo 
      belongs_to_9 : bar 
  • The Cloudera CDK has been updated to CDK 0.8.1. For information on changes included in this release, see the Release Notes. This new version includes updates to Cloudera Morphlines functionality. For the latest Cloudera CDK documentation, see Cloudera Development Kit.
  • Tika has been upgraded to tika-1.4.
  • Lily HBase Indexer supports Kerberos authentication. Search can use the Lily HBase Indexer to index data stored on HBase servers that require Kerberos authentication.
  • Search supports Sentry for providing authorization control. For more information, see Configuring Sentry for Search.