This is the documentation for Cloudera Navigator 2.0.0.
Documentation for other versions is available at Cloudera Documentation.

Cloudera Navigator Metadata Server

Describes how the Cloudera Navigator Metadata Server extracts metadata from the entities managed by Cloudera Manager, and how to add and configure the Navigator Metadata Server role.

Required Role:

Continue reading:

About Metadata Extraction

The Navigator Metadata Server extracts metadata from the following resource types from the listed servers:
  • HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. However, if you have High Availability enabled, metadata is extracted as soon as it is written to the JournalNodes.
  • Hive - Extracts database and table metadata from the Hive Metastore Server.
  • MapReduce - Extracts job metadata from the JobTracker. The default setting in Cloudera Manager retains a maximum of five jobs, which means if you run more than five jobs between Navigator extractions, the Navigator Metadata Server would extract the five most recent jobs.
  • Oozie - Extracts Oozie workflows from the Oozie Server.
  • Pig - Extracts Pig script runs from the JobTracker or Job History Server.
  • Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server.
  • YARN - Extracts job metadata from the Job History Server.
If an entity is created at time t0 in the system, that entity will be extracted and linked in Navigator after the extraction poll period (default 10 minutes) plus a service-specific interval as follows:
  • HDFS: t0 + extraction poll period + HDFS checkpoint interval (default 1 hour)
  • HDFS + HA: t0 + extraction poll period
  • Hive: t0 + extraction poll period + Hive maximum wait time (default 60 minutes)

Adding and Starting the Navigator Metadata Server Role

  1. Do one of the following:
    • Select Clusters > Cloudera Management Service > Cloudera Management Service.
    • On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management Service link.
  2. Click the Instances tab.
  3. Click the Add Role Instances button. The Customize Role Assignments page displays.
  4. Assign the Navigator role to a host.
    1. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable, but you can reassign role instances to hosts of your choosing, if desired.

      Click a field below a role to display a dialog containing a pageable list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the pageable hosts dialog.

      The following shortcuts for specifying hostname patterns are supported:
      • Range of hostnames (without the domain portion)
        Range Definition Matching Hosts
        10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
        host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
        host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
      • IP addresses
      • Rack name

      Click the View By Host button for an overview of the role assignment by hostname ranges.

  5. Click Finish. The Instances page displays.
  6. Check the checkbox next to the Navigator Metadata Server role.
  7. Select Actions for Selected > Start. Click Start to confirm the action.

Configuring the Navigator Metadata Server Storage Directory

Describes how to configure where the Navigator Metadata Server stores extracted data. The default is /var/lib/cloudera-scm-navigator.

  1. Do one of the following:
    • Select Clusters > Cloudera Management Service > Cloudera Management Service.
    • On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management Service link.
  2. Click the Configuration tab.
  3. Click the Navigator Metadata Server Default Group.
  4. Specify the directory in the Navigator Metadata Server Storage Dir property.
  5. Click Save Changes.
  6. Click the Instances tab.
  7. Check the checkbox next to the Navigator Metadata Server role.
  8. Select Actions for Selected > Restart.

Configuring the Navigator Metadata Server Port

Describes how to configure the port on which the Navigator Metadata UI is accessed. The default is 7187.

  1. Do one of the following:
    • Select Clusters > Cloudera Management Service > Cloudera Management Service.
    • On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management Service link.
  2. Click the Configuration tab.
  3. Select Navigator Metadata Server Default Group > Ports and Addresses.
  4. Specify the port in the Navigator Metadata Server Port property.
  5. Click Save Changes.
  6. Click the Instances tab.
  7. Check the checkbox next to the Navigator Metadata Server role.
  8. Select Actions for Selected > Restart.

Navigator Metadata Server Sizing and Performance Recommendations

Two activities determine Navigator Metadata Server resource requirements:
  • Extracting metadata from the cluster and creating relationships
  • Querying

The Navigator Metadata Server uses Solr to store, index, and query metadata. Indexing happen during extraction. Querying is fast and efficient because the data is indexed.

Memory and CPU requirements are based on amount of data that is stored and indexed. With 6 GB of RAM and 8-10 cores Solr can process 6 million entities in 25-30 minutes or 80 million entities in 8 to 9 hours. Any less RAM than 6GB and will result in excessive garbage collection and possibly out-of-memory exceptions. For large clusters, Cloudera advises at least 8 GB of RAM and 8 cores. The Solr instance runs in process with Navigator, so the Java heap for the Navigator Metadata Server should be set according to the size of cluster.

By default, during the Cloudera Manager first run installation wizard the Navigator Audit Server and Navigator Metadata Server are assigned to the same host as the Cloudera Management Service monitoring roles. This configuration works for a small cluster, but should be updated before the cluster grows. You can either change the configuration at installation time or move the Navigator Metadata Server if necessary.

Moving a Navigator Metadata Server Role

  1. Stop the Navigator Metadata Server role, delete it from existing host, and add it to a new host.
  2. If the Solr data path is not on NFS/SAN, move the data to the same path on the new host.
  3. Start the Navigator Metadata Server role.

Enabling Hive Metadata Extraction in a Secure Cluster

The Navigator Metadata Server uses the hue user to connect to the Hive Metastore. The hue user is able to connect to the Hive Metastore by default. However, if the Hive service Hive Metastore Access Control and Proxy User Groups Override property and/or the HDFS service Hive Proxy User Groups property have been changed from their default values to settings that prevent the hue user from connecting to the Hive Metastore, Navigator Metadata Server will be unable to extract metadata from Hive. If this is the case, modify the Hive service Hive Metastore Access Control and Proxy User Groups Override property and/or the HDFS service Hive Proxy User Groups property so that the hue user can connect as follows:
  1. Go to the Hive or HDFS service.
  2. Click the Configuration tab.
  3. Expand the Service-Wide > Proxy category.
  4. In the Hive service Hive Metastore Access Control and Proxy User Groups Override field or the HDFS service Hive Proxy User Groups field, click the Value column, and click to add a new row.
  5. Type hue.
  6. Click Save Changes to commit the changes.
  7. Restart the service.