You can view detailed information about each host, including:
- Name, IP address, rack ID
- Health status of the host and last time the Cloudera Manager Agent sent a heartbeat to the Cloudera Manager Server
- Number of cores
- System load averages for the past 1, 5, and 15 minutes
- Memory usage
- File system disks, their mount points, and usage
- Health Test Results for the host
- Charts showing a variety of metrics and health test results over time.
- Role instances running on the host and their health
- CPU, memory, and disk resources used for each role instance
To view detailed host information:
The Status page is displayed when a host is initially selected and provides summary information about the status of the selected host. Use this page to gain a general understanding of work being done by the system, the configuration, and health status.
If this host has been decommissioned or is in maintenance mode, you will see the following icon(s) ( , ) in the top bar of the page next to the status message.
This panel provides basic system configuration such as the host's IP address, rack, health status summary, and disk and CPU resources. This information summarizes much of the detailed information provided in other panes on this tab.
- To view details about the Host agent, click the link at the far right in the Details section.
Cloudera Manager monitors a variety of metrics that are used to indicate whether a host is functioning as expected. The Health Tests panel shows health test results in an expandable/collapsible list, typically with the specific metrics that the test returned. (You can Expand All or Collapse All from the links at the upper right of the Health Tests panel).
- The color of the text (and the background color of the field) for a Health Test result indicates the status of the results. The tests are sorted by their health status – Good, Concerning, Bad, or Disabled. The list of entries for good and Disabled health tests are collapsed by default; however, Bad or Concerning results are shown expanded.
- The text of a health test also acts as a link to further information about the test. Clicking the text will pop up a window with further information, such as the meaning of the test and its possible results, suggestions for actions you can take or how to make configuration changes related to the test. The help text for a health test also provides a link to the relevant monitoring configuration section for the service. See Configuring Monitoring Settings for more information.
- The small heatmap icon () to the right of some of the tests takes you to a heatmap display that lets you compare the values of the relevant test result metrics across the nodes of your cluster.
The Health History provides a record of state transitions of the Health Tests for the host.
- Click the arrow symbol at the left to view the description of the health test state change.
- Click the View link to open a new page that shows the state of the host at the time of the transition. Note that in this view some of the status settings are greyed out, as they reflect a time in the past, not the current status.
The File systems panel provides information about disks, their mount points and usage. Use this information to determine if additional disk space is required.
Use the Roles panel to see the role instances running on the selected host, as well as each instance's status and health. Host machines are configured with one or more role instances, each of which corresponds to a service. The role indicates which daemon runs on the host. Some examples of roles include the NameNode, Secondary NameNode, Balancer, JobTrackers, DataNodes, RegionServers and so on. Typically a host will run multiple roles in support of the various services running in the cluster.
Clicking the role name takes you to the role instance's status page. Using the triangle to the right of the role name, you can directly access the tabs on the role page (such as the Processes, Commands, Configuration, or Audits tabs) as well as the status page for the parent Service of the role.
You can delete a role from the host from the Instances tab of the Service page for the parent service of the role. You can add a role to a host in the same way. See Adding Role Instances and Deleting a Role Instance.
Charts are shown for each host instance in your cluster.
See Viewing Charts for Cluster, Service, Role, and Host Instances for detailed information on the charts that are presented, and the ability to search and display metrics of your choice.
Health heat maps let you compare the status or performance of the different hosts in your cluster.
From the Health Tests panel for the host, you can access heatmaps that show related metrics for all the nodes in your cluster. These are accessed by clicking the small heatmap icon ( ) to the right of some of the tests in the Health Tests panel for the Host you are viewing.
See Viewing Heatmaps for Services and Roles for more information — heatmaps for hosts are very similar to those for roles, and the explanation there applies to hosts as well.
The Processes page provides information about each of the processes that are currently running on this host. Use this page to access management web UIs, check process status, and access log information.
The Processes tab includes a variety of categories of information.
- Service — The name of the service. Clicking the service name takes you to the service status page. Using the triangle to the right of the service name, you can directly access the tabs on the role page (such as the Instances, Commands, Configuration, Audits, or Charts Library tabs).
- Instance — The role instance on this host that is associated with the service. Clicking the role name takes you to the role instance's status page. Using the triangle to the right of the role name, you can directly access the tabs on the role page (such as the Processes, Commands, Configuration, Audits, or Charts Library tabs) as well as the status page for the parent Service of the role.
- Name — The process name.
- Link — A link to the management interface for this role instance on this system. This is not available in all cases.
- Status — The current status for the process. Statuses include stopped, starting, running, and paused.
- PID — The unique process identifier.
- Uptime — The length of time this process has been running.
- Full log file — A link to the full log (a file external to Cloudera Manager) for this host log entries for this host.
- Stderr — A link to the stderr log (a file external to Cloudera Manager) for this host.
- Stdout — A link to the stdout log (a file external to Cloudera Manager) for this host.
The Resources page provides information about the resources (CPU, memory, disk, and ports) used by every service and role instance running on the selected host.
Each entry on this page lists:
- The service name
- The name of the particular instance of this service
- A brief description of the resource
- The amount of the resource being consumed or the settings for the resource
The resource information provided depends on the type of resource:
An approximate percentage of the CPU resource consumed.
The number of bytes consumed.
The disk location where this service stores information.
The port number being used by the service to establish network connections.
The Commands page shows you running or recent commands for the host you are viewing. See Viewing Running and Recent Commands for more information.
The Configuration page for a host lets you set monitoring properties for the selected host. In addition, for parcel upgrades, you can blacklist specific products — specify products that should not be distributed or activated on the host.
To modify the monitoring properties for the selected host:
- Select .
- Click the Monitoring category.
- Under Thresholds you can configure the thresholds for monitoring the free space in the Agent Log and Agent Process Directories for all your hosts. You can set these thresholds as either or both a percentage and an absolute value (in bytes).
- Under Other you can set health check thresholds for a variety of conditions related to memory usage and other properties. Here is where you can enable Alerting for health check events for all your managed hosts.
The monitoring settings you make on this page will override the global host monitoring settings from the Configuration tab of the All Hosts page.
For more information, see Modifying Configuration Settings.
The Components page lists every component installed on this host. This may include components that have been installed but have not been added as a service (such as YARN, Flume, or Impala).
This includes the following information:
- Component — The name of the component.
- Version — The version of CDH from which each component came (CDH3 or CDH4).
- Component Version — The detailed version number for each component.
The Audits page lets you filter for audit events related to this host. See Audit Events for more information.
The Charts Library page for a host instance provides charts for all metrics kept for that host instance, organized by category. Each category is collapsible/expandable. See Viewing Charts for Cluster, Service, Role, and Host Instances for more information.