This is the documentation for Cloudera Manager 4.8.2.
Documentation for other versions is available at Cloudera Documentation.

Viewing Service Status

To view service status, do one of the following:

  • In the Home page, click the service name.
  • Pull down the menu from the Services tab and select the service instance you want to see.  
  • Select Services tab > All Services and then click a link in the Name column.

For all service types there is a Status Summary that shows, for each configured role, the overall status and health of the role instance(s).

  Note: Not all service types provide complete monitoring and health information. Hue, Oozie, Hive, and YARN (CDH4 only) only provide the basic Status Summary.

Each service that supports monitoring provides a set of monitoring properties where you can enable or disable health tests and events, and set thresholds for tests and modify thresholds for the status of certain health checks. For more information see Configuring Monitoring Settings.

The HDFS, MapReduce, HBase, ZooKeeper, and Flume services also provide additional information: a snapshot of service-specific metrics, health test results, health history, and a set of charts that provide a historical view of metrics of interest.

Viewing Past Status

The status and health information shown on this page represents the state of the service or role instance at a given point in time. The exceptions are the charts and the Logs and Events tabs, which show information for the time range currently selected on the Time Range Selector (which defaults to the past 30 minutes). By default, the information shown on this page is for the current time. You can view status for a past point in time simply by moving the time marker () to a point in the past.

When you move the time marker to a point in the past (for Services/Roles that support health history), the Health Status clearly indicates that it is referring to a past time. A Now button () allows you to quickly switch to view the current state of the service. In addition, the Actions menu is disabled while you are viewing status in the past – to ensure that you cannot accidentally take an action based on outdated status information.

See Time Line for more details.

Status Summary

The Status Summary shows the status of each service instance being managed by Cloudera Manager. Even services such as Hue, Oozie, or YARN (which are not monitored by Cloudera Manager) show a status summary. The overall status for a service is a roll-up of the health check results for the service and all its role instances. The Status can be:

Table 1. Status
Icon Status Description

Started with outdated configuration For a service, this indicates the service is running, but at least one of its roles is running with a configuration that does not match the current configuration settings in Cloudera Manager. For a role, this indicates a configuration change has been made that requires a restart, and that restart has not yet occurred.
Starting or stopping The entity is starting up but is not yet running or the service or role is stopping but has not stopped yet.
Stopped The entity is stopped, as expected.
Down The entity is not running, but it is expected to be running.
History not available The application is in historical mode, and the entity does not have historical monitoring support. This is the case for services other than HDFS, MapReduce and HBase such as ZooKeeper, Oozie, or Hue .
Status not available The entity is not started or stopped in the same way as a regular service or role. Examples are the HDFS Balancer (which runs from the HDFS Rebalance action) or Gateway roles. The Start and Stop commands are not applicable to these instances.
None The entity does not have a status. For example, it is not something that can be running and it cannot have health.
Good health The entity is running with good health. For a specific health check, the returned result is normal or within the acceptable range. For a role or service, this means all health checks for that role or service are Good.
   Concerning health The entity is running with concerning health. For a specific health check, the returned result indicates a potential problem. Typically this means the test result has gone above (or below) a configured Warning threshold. For a role or service, this means that at least one health check is Concerning.
Bad health The entity is running with bad health. For a specific health check, the check failed, or the returned result indicates a serious problem. Typically this means the test result has gone above (or below) a configured Critical threshold. For a role or service, this means that at least one health check is Bad.
Disabled health The entity is running, but all of its health checks are disabled.
Unknown health The entity is running, but there is not enough information to determine its health.
Unknown Status of a service or role or service instance is unknown. This can occur for a number of reasons, such as the Service Monitor is not running, or connectivity to the agent doing the health monitoring has been lost.

You can click either the Status link for a role to drill down to see the details of the status of the role instance(s). If there is a single instance of the role type, the link takes you directly to the role instance status.

If there are multiple role instances (such as for DataNodes, TaskTrackers, RegionServers) clicking to the right of the role's health displays a pop-up that allows you to select specific instances. Furthermore, this pop-up displays the results for each health test that applies to this role type.

You can filter by an individual health check result. Click the result link - an appears by the link (as shown in the illustration above) and only the instance(s) with that specific health status will appear in the instances list.

Service Summary

Some services (specifically HDFS, MapReduce, HBase, Flume, and ZooKeeper) provide additional statistics about their operation and performance. These are shown in a Summary panel at the left side of the page. The contents of this panel depend on the service - for example:

  • The HDFS Summary shows disk space usage.
  • The MapReduce Summary shows statistics on slot usage, jobs and so on.
  • The Flume Summary provides a link to a page of Flume metric details. See Flume Metric Details.
  • The ZooKeeper Summary provides links to the ZooKeeper role instances (nodes) as well as Zxid information if you have a ZooKeeper Quorum (multiple ZooKeeper servers).

You can hover over an individual metric to pop up a definition.

For example:

Other services such as Hue, Oozie, Impala, and Cloudera Manager itself, do not provide a Service Summary.

Health Tests and Health History

The Health Tests and Health History panels appear for HDFS, MapReduce, HBase, Flume, Impala, ZooKeeper, and the Cloudera Manager service. Other services such as Hue, Oozie, and YARN do not provide a Health Test panel.

The Health Tests panel shows health test results in an expandable and collapsible list, typically with the specific metrics that the test returned. (You can Expand All or Collapse All from the links at the upper right of the Health Tests panel).

  • The color of the text (and the background color of the field) for a Health Test result indicates the status of the results. The tests are sorted by their health status – Good, Concerning, Bad, or Disabled. The list of entries for Good and Disabled health tests are collapsed by default; however, Bad or Concerning results are shown expanded.
  • The text of a health test also acts as a link to further information about the test. Clicking the text will pop up a window with further information, such as the meaning of the test and its possible results, suggestions for actions you can take or how to make configuration changes related to the test. The help text for a health test also provides a link to the relevant monitoring configuration section for the service. See Configuring Monitoring Settings for more information.
  • In the Health Tests panel:
    • Clicking displays the lists of health checks that contributed to the health test.
    • Clicking the small heatmap icon () to the right of some of the tests takes you to a heatmap display that lets you compare the values of the relevant test result metrics across the nodes of your cluster.
  • In the Health History panel:
    • Clicking displays the lists of health checks that contributed to the health history.
    • Clicking the Show link moves the time range to the historical time period.

Charts

HDFS, MapReduce, HBase, ZooKeeper, Flume, and Cloudera Management Services all display charts of some of the critical metrics related to their performance and health. Other services such as Hue, Oozie, and Hive do not provide charts.

See Viewing Charts for Cluster, Service, Role, and Host Instances for detailed information on the charts that are presented, and the ability to search and display metrics of your choice.

Flume Metric Details

From the Flume Service Status page, click the Flume Metric Details link in the Flume Summary panel to display details of the Flume agent roles.

On this page you can view a variety of metrics about the Channels, Sources and Sinks you have configured for your various Flume agents. You can view both current and historical metrics on this page.

The Channels section shows the metrics for all the channel components in the Flume service. These include metrics related to the channel capacity and throughput.

The Sinks section shows metrics for all the sink components in the Flume service. These include event drain statistics as well as connection failure metrics.

The Sources section shows metrics for all the source components in the Flume service.

Note that this page maintains the same navigation bar as the Flume service status page, so you can go directly to any of the other tabs (Instances, Commands, Configuration, or Audits).