Cloudera Manager Architecture
The Cloudera Manager Server performs the following functions:
- Tracks the Cloudera Manager data model, which is stored in the Cloudera Manager Server database. The data model is a catalog of the available host machines in the cluster, and the services, roles, and configurations assigned to each host.
- Communicates with Agents to send configuration instructions and track Agents' heartbeats
- Performs command execution to do tasks
- Provides an Admin console for the operator to perform management and configuration tasks
- Creates, reads, validates, updates, and deletes configuration settings
- Calculates and displays the health of the cluster and its components
- Tracks host metrics such as disk usage, CPU, and RAM
- Provides a comprehensive set of APIs for the various features supported in Cloudera Manager
- Manages Kerberos credentials
- Monitors the health of Hadoop daemons, and dozens of service performance metrics, and alerts you when you approach critical thresholds.
- Keeps a history of activity monitoring data and configuration changes
Each Agent starts and stops Hadoop daemons on the local host machine and collects statistics (overall and per-process memory usage and CPU usage, log tailing) for health calculations and status in the Admin console.
The Cloudera Manager Agent runs as root so that it can make sure the required directories are created and that processes and files are owned by the appropriate user (for example, the hdfs user and mapred user).
What You Can Use Cloudera Manager to Do
Using Cloudera Manager, you can manage, configure and supervise Hadoop daemons on a set of host machines:
The first time you start the Cloudera Manager Admin Console, you can use the Cloudera Manager wizard to:
- Install CDH and the Oracle JDK on cluster hosts.
- Optionally install Cloudera Impala (if installing on RHEL/Centos 6).
- Configure and start services.
After First Run, you can use the Cloudera Manager Admin Console to:
- Configure CDH while seeing suggested ranges of values for parameters and illegal values highlighted; you can also configure override settings on specific hosts, and for specific role instances.
- Start and stop Hadoop daemons on hosts.
- Decommission individual roles, or all roles on a host to facilitate host maintenance.
- View the health of your system and its components.
- View the daemons that are currently running.
- Add and reconfigure services and role instances.
- Specify dependencies between services. Configuration changes for a service are propagated to its dependent service
- Generate CDH configurations for clients to use to connect to the cluster, and deploy those configurations automatically to clients.
- Manage rack locality configuration.
- With CDH4, configure HDFS High Availability or NameNode Federation.
- Download, distribute and activate a new CDH version (CDH4.1.3 or later) all from within Cloudera Manager.
- Use the Cloudera Manager API to export or import deployment settings to and from clusters.
- Manage multiple clusters, which can be either CDH3 or CDH4 clusters.
- Display metrics about your jobs, such as the number of currently running tasks and their CPU and memory usage.
- Display metrics about your Hadoop services, such as the average HDFS I/O latency and the number of jobs running concurrently.
- Display metrics about your cluster, such as the average CPU load across all your machines.
- Get assistance with configuring Kerberos security (Cloudera Manager generates and installs the host and service key tab files for you.)
- Temporarily suppress alerting for individual roles, services, hosts, or even the entire cluster to allow maintenance/troubleshooting without generating excessive alert traffic.
Cloudera Manager also collapses several levels of CDH configuration abstraction into one. For example, you can manage Java heap usage in the same place as Hadoop-specific parameters. Cloudera Manager is internally secure, and you can configure the Admin Console and Agents to connect with the Server over TLS.