Cloudera Manager and CDH Databases
Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. Cloudera Manager supports using a variety of databases to store required information. To facilitate rapid completion of simple installations, the Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. This automatically installed database is referred to as an embedded PostgreSQL database. In addition, some CDH services (Hive, Hue, and Oozie) use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.
While the embedded database is a useful option for getting started quickly, Cloudera Manager also allows you to opt to use your own PostgreSQL, MySQL, or Oracle database for Cloudera Manager and CDH services that use databases. To learn more about database options or if you are unsure whether or not using the embedded database is right for your environment, continue with the following sections.
What Databases Must Be Installed
The Cloudera Manager Server and the server's Activity Monitor and Reports Manager all require databases, as does Cloudera Navigator and the Hive Metastore. The Host Monitor and Service Monitor have their own internal databases.
Cloudera provides three install paths:
- Path A automatically installs embedded PostgreSQL databases to meet the requirements of the services. This path reduces the number of installation tasks you must complete, as well as the number of choices to make.
- Path B and Path C requires you have databases in your environment for use by Cloudera Manager, Cloudera Management Services, and Hive Metastore. This path requires more input and intervention as you either install databases or gather information about existing databases. These paths also provides greater flexibility in choosing database types and configurations.
Cloudera Manager does support deploying different types of databases in a single environment, but doing so may create unexpected complications. Cloudera recommends choosing one of the three supported database providers to use for all of the Cloudera Manager databases.
In most cases, you should install databases and services on the same host. For example, if you create the database for Activity Monitor on myhost1, then you should typically assign the Activity Monitor role to myhost1. You will assign the Activity Monitor and Reports Manager roles in the Cloudera Manager wizard during the install or upgrade process. After completing the install or upgrade process, you can also modify role assignments in the Management services pages of Cloudera Manager. While it is true that database location is changeable, before beginning an installation or upgrade, you should decide which hosts you will use. Note that the JDBC connector for your database must be installed on the hosts where you assign the Activity Monitor and Reports Manager roles.
It is possible to install the database and services on different hosts. Separating databases from services is more likely to occur in larger deployments and in cases where more sophisticated database administrators actively choose to establish such a configuration. For example, databases and services might be separated if your environment includes Oracle databases that will be separately managed by Oracle database administrators.
The table that follows provides a summary; details are in the sections that follow.
|Install or Upgrade Path||Install Supported Database For||Typically Install Databases on Systems That Will Host|
|Installation Path A - Automated Installation by Cloudera Manager||No installations required. Automated installation automatically creates embedded PostgreSQL databases for Cloudera Manager and all services.||No manual installation required.|
|Installation Path B - Installation Using Your Own Method and Installation Path C - Installation Using Tarballs||Cloudera Manager Server configuration and for Activity Monitor, Reports Manager, Hive Metastore, and Cloudera Navigator.||The Cloudera Manager Server, Activity Monitor, Reports Manager, and Cloudera Navigator roles, and the Hive Metastore. Alternately, you may install these databases on other systems, assuming those systems are accessible to the Cloudera Manager Server.|
|Upgrading Cloudera Manager||Activity Monitor, Reports Manager, the Hive Metastore, and Cloudera Navigator roles.||Activity Monitor, Reports Manager, the Hive Metastore, and Cloudera Navigator roles.|
Cloudera Manager Server Database
The Cloudera Manager Server database, which is used for storing information about service configurations, is independent of the databases used by the Activity Monitor, Reports Manager, Cloudera Navigator, and the Hive Metastore.
|Automatic installation: Installation Path A - Automated Installation by Cloudera Manager||The wizard automatically installs, configures, and uses embedded PostgreSQL databases to store information about service configuration, as well as the Activity Monitor, Reports Manager, Cloudera Navigator, and the Hive Metastore. This functionality is provided by the cloudera-scm-server-db package, and you can start and stop these databases using the service cloudera-scm-server-db [start|stop] command. If you are using Installation Path A, you can proceed directly to Installation Path A - Automated Installation by Cloudera Manager.|
|Manual installation: Installation Path B - Installation Using Your Own Method or Installation Path C - Installation Using Tarballs||You must install a supported database. This database can be installed on the host where you install the Cloudera Manager Server or on a host accessible to the Cloudera Manager Server. You will need to configure the connection between Cloudera Manager and the database and described in the alternative installation paths.|
Databases for Hive, Hue, and Oozie
Configuring an External Database for the Hive MetastoreBy default, Cloudera Manager uses the embedded PostgreSQL database for the Hive Metastore. If necessary, you can configure Cloudera Manager to use an external database as the database for the Hive Metastore — do this before you install Cloudera Manager. For more information, see the instructions in External PostgreSQL Database, MySQL Database, or Oracle Database.
Configuring an External Database for HueBy default, Cloudera Manager uses SQLite for Hue's database. If you want to use an external database for Hue, you would do the configuration after Cloudera Manager is installed. For more information, see Using an External Database for Hue .
Configuring an External Database for OozieBy default, Cloudera Manager uses Derby for the Oozie database. If you want to use an external database for Oozie, you would do the configuration after Cloudera Manager is installed. For more information, see Using an External Database for Oozie .