This is the documentation for Cloudera Manager 4.8.2.
Documentation for other versions is available at Cloudera Documentation.

The Solr Service

Cloudera Search is implemented by the Solr service.

Installing the Solr Service

You can install the Solr service using one of the two following methods:
  • Automated Installation by Cloudera Manager: This method (Installation Path A) performs a CDH installation using Cloudera Manager's installation wizard. As part of the installation, Cloudera Manager offers to install Cloudera Search. You can install CDH (and Search) using parcels (recommended) or packages. See the Cloudera Manager Installation Guide for further information.
  • Installation Using Your Own Method: If you use this method (Installation Path B), then you need to follow the instructions given at "Installing Cloudera Search" in the Cloudera Search Installation Guide for manually installing all the required packages.
The automated installation by Cloudera Manager installs the Cloudera Search packages, but does not configure or start the service automatically. You must use the Add Service workflow to add the Solr service. The following sections will guide you through:

Adding a Solr Service

  Note: HDFS and ZooKeeper services must be running before adding the Solr service.
  Note: Cloudera Manager allows you to add Solr and Key-Value Store Indexer services even if the CDH version deployed in your cluster (for example, CDH 4.4) does not support Cloudera Search. However, you will not be able to start the services.
  1. Connect to the Cloudera Manager Admin Console.
  2. Click the Services tab, then choose All Services.
  3. From the Actions menu, select Add a Service.
  4. Choose the Solr service.
  5. Follow the wizard for adding Solr service to your cluster. Select which hosts on your cluster to add and configure the Solr Servers.

After completing the wizard, Cloudera Manager automatically initializes Solr home in ZooKeeper and HDFS.

Once you have set up the Solr service, you can create collections by following the instructions in the Cloudera Search Installation Guide, in the section "Deploying Cloudera Search in SolrCloud Mode", under the heading "Administering Solr with the solrctl Tool."

Using Flume with Search

To use a Flume Solr sink, the Flume service must be running on your cluster. See The Flume Service.

Configuring Flume Morphline Solr Sink for use with the Solr Service

See the Cloudera Search User Guide, specifically the section "Flume Near Real-Time Indexing Reference" for information about how to configure Flume Morphline Solr Sink.

Cloudera Manager provides a set of configuration settings under the Flume Service to help configure Flume Morphline Solr Sink. These settings are templates that you will need to modify for your deployment.
  1. Go to the Flume service.
  2. Select Configuration > View and Edit.
  3. Under the Agent role group, find the Configuration File property that holds the flume.conf file. This is the primary configuration file for Flume agents. Modify this file (or paste your own version in here). Note that there could be more than one Agent role group -- if so, you will need to configure each one appropriately.
  4. Under the Agent role group, go to the Flume-NG Solr Sink category. Here you will find the following properties:
    • Morphlines File (morphlines.conf) - Configures Morphlines for Flume agents. Note that you should use $ZK_HOST in this file instead of specifying a ZooKeeper quorum. Cloudera Manager automatically replaces the $ZK_HOST variable with the correct value during the Flume configuration deployment.
    • Custom MIME-types File (custom-mimetypes.xml) — for use with the detectMimeTypes command. See the Cloudera Morphlines Reference Guide for details on this command.
    • Grok Dictionary File (grok-dictionary.conf) — for use with the grok command. See the Cloudera Morphlines Reference Guide for details of this command.

Once configuration is complete, Cloudera Manager automatically deploys the required files to the Flume agent's process directory when it starts the Flume agent. Therefore, you can reference the files in the Flume agent's configuration file using only their (relative path) names. For example, in flume.conf you can use the name morphlines.conf to refer to the location of the morphlines configuration file.

Deploying Search with Hue

In CDH 4.3 and below, in order to use Cloudera Search with Hue, you must update the URL for the Solr Server in the Hue Server safety valve.
  1. Go to the Hue service.
  2. Select Configuration > View and Edit.
  3. Search for the word "safety". This will display a set of Hue Safety Valve properties
  4. Add information about your Solr host to the Hue Server Configuration Safety Valve for hue_safety_valve_server.ini found under the Hue Server (Default) / Advanced category. For example, if your hostname is SOLR_HOST, you might add the following:
    [search]
    ## URL of the Solr Server
    solr_url=http://SOLR_HOST:8983/solr
  5. Save Changes to save your safety valve changes.
  6. Restart the Hue Service.
  Important: If you are using parcels with CDH4.3, you must register the "hue-search" application manually or access will fail. You do not need to do this if you are using CDH4.4 or later.
  1. Stop the Hue service.
  2. From the command line do the following:
    1. cd /opt/cloudera/parcels/CDH4.3.0-1.cdh4.3.0.pXXX/share/hue 
      (Substitute your own local repository path for the /opt/cloudera/parcels/... if yours is different, and specify the appropriate name of the CDH4.3 parcel that exists in your repository.)
    2. ./build/env/bin/python ./tools/app_reg/app_reg.py --install
            /opt/cloudera/parcels/SOLR-0.9.0-1.cdh4.3.0.pXXX/share/hue/apps/search  
    3. sed -i 's/\.\/apps/..\/..\/..\/..\/..\/apps/g' ./build/env/lib/python2.X/site-packages/hue.pth 
      where python should be the version you are using (e.g. python2.4).
  3. Start the Hue service.