Installation Path A - Automated Installation by Cloudera Manager

If your cluster meets the requirements for Installation Path A, follow the instructions in this section for automated installation by Cloudera Manager. The requirements for Path A are:

  • Uniform SSH access to cluster hosts on the same port from Cloudera Manager Server host.
  • All hosts must have access to standard package repositories.
  • All hosts must have access to the either archive.cloudera.com on the internet or to a local repository with the necessary installation files.

The Cloudera Manager configuration, as well as the other monitoring and management information is stored in databases. As part of the process of Installation Path A, Cloudera Manager installs embedded PostgreSQL databases. It is simplest to use these automatically installed and configured databases. During the installation, you are provided with the option to select databases other than the automatically installed databases. If you intended to customize the installation to use other databases, install and configure them before beginning to use Installation Path A.

Using custom databases is a more advanced process, which is more often a part of an Installation Using Your Own Method. For more information on installing custom databases, see Installing and Configuring Databases. Otherwise, use the embedded PostgreSQL database, which the installer creates.

The general steps in this procedure for Installation Path A are:

Step 1: Download and Run the Cloudera Manager Installer

  Important:

For installation purposes, the Cloudera Manager Server must have SSH access to the cluster hosts and you must log in using a root account or an account that has password-less sudo permission. See Requirements for Cloudera Manager for more information.

Cloudera Manager accesses archive.cloudera.com by using yum on Red Hat systems, zypper on SUSE systems, or apt-get on Debian/Ubuntu systems. If your hosts access the Internet through an HTTP Proxy, you can configure yum, zypper, or apt-get, system-wide, to access archive.cloudera.com through a proxy. To do so, modify the system configuration on the Cloudera Manager Server host and on every cluster host where you want to install CDH. This is not required in all cases.

To configure your system to use a proxy

On Red Hat systems, add the following property to /etc/yum.conf:

proxy=http://server:port/

On SUSE systems, add the following property to /root/.curlrc:

--proxy=http://server:port/

On Debian/Ubuntu systems, add the following property to /etc/apt/apt.conf:

Acquire::http::Proxy "http://server:port";

To download and run the Cloudera Manager installer:

  1. Download cloudera-manager-installer.bin from the Cloudera Downloads page to the host where you want to install the Cloudera Manager Server that is on your cluster or is accessible to your cluster over your network. Install Cloudera Manager on a single host.
  2. After downloading cloudera-manager-installer.bin, change it to have executable permission.
    $ chmod u+x cloudera-manager-installer.bin
  3. Run cloudera-manager-installer.bin.
      Note:

    This installer's default behavior is to install packages from the Internet. If you have created a local repository and configured your machine to recognize that repository, you can instruct the installer to use local repositories by running the cloudera-manager-installer.bin with the --skip_repo_package=1 option.

    $ sudo ./cloudera-manager-installer.bin
  4. Read the Cloudera Manager Readme and then press Enter to choose Next.
  5. Read the Cloudera Manager License and then press Enter to choose Next. Use the arrow keys and press Enter to choose Yes to confirm you accept the license.
  6. Read the Oracle Binary Code License Agreement and then press Enter to choose Next. Use the arrow keys and press Enter to choose Yes to confirm you accept the Oracle Binary Code License Agreement. The Cloudera Manager installer begins installing the Oracle JDK and the Cloudera Manager repo files and then installs the packages. The installer also installs the Cloudera Manager Server.
      Note:
    If an error message "Failed to start server" appears while running cloudera-manager-installer.bin, exit the installation program. If the Cloudera Manager Server log file /var/log/cloudera-scm-server/cloudera-scm-server.log contains the following message, then it's likely you have SELinux enabled
    Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
            at java.net.URLClassLoader$1.run(Unknown Source)
            at java.security.AccessController.doPrivileged(Native Method)
            at java.net.URLClassLoader.findClass(Unknown Source)
            at java.lang.ClassLoader.loadClass(Unknown Source)
            ...
    You can disable SELinux by running the following command on the Cloudera Manager Server host:
    $ sudo setenforce 0

    To disable it permanently, edit /etc/selinux/config.

  7. Note the complete URL provided for the Cloudera Manager Admin Console, including the port number, which is 7180 by default. Click OK to continue.
  8. Click OK to exit the installer.
  Note:

If the installation is interrupted for some reason, you may need to clean up before you can re-run it. See Uninstalling Cloudera Manager.

Step 2: Start the Cloudera Manager Admin Console

The Cloudera Manager Admin Console enables you to use Cloudera Manager to configure, manage, and monitor Hadoop on your cluster. Before using the Cloudera Manager Admin Console, gather information about the server's URL and port.

The server URL takes the following form:

http://<Server host>:<port>

<Server host> is the fully-qualified domain name or IP address of the host machine where the Cloudera Manager Server is installed. <port> is the port configured for the Cloudera Manager Server. The default port is 7180. For example, use a URL such as the following:

http://myhost.example.com:7180/

Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the wizard in the next section. While you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.

To start the Cloudera Manager Admin Console:

In a web browser, enter the URL, including the port, for the Cloudera Server. The login screen for Cloudera Manager appears.

Log into Cloudera Manager. The default credentials are: Username: admin Password: admin

Step 3: Use Cloudera Manager for Automated CDH Installation and Configuration

The following instructions show you how to use the Cloudera Manager wizard to do an initial installation and configuration. The wizard helps you to install and set up Cloudera packages across your cluster and will:

  • Install and validate your Cloudera Manager License
  • Find the cluster hosts you specify via hostname and IP-address ranges
  • Connect to each host with SSH to install the Cloudera Manager Agent and CDH (including Hue)
  • Install the Oracle JDK on the cluster hosts (if not already installed)
  • Install CDH packages or parcels, optionally including the Cloudera Impala package or parcel
  • Configure Hadoop automatically and start the Hadoop services
  Important:
All hosts in the cluster must have some way to access installation files. This can be done one of two ways:
  • Internet access to allow the wizard to install software packages or parcels from archive.cloudera.com.
  • An internal repository that each host can access. For example, for Red Hat hosts, you could set up a Yum repository. See Creating and Using your own Repository for more information.

To use Cloudera Manager:

  1. The first time you start the Cloudera Manager Admin Console, the install wizard starts up.
  2. Browse to your Cloudera Manager License file. If you don't install the license now, Cloudera Manager Free Edition will be installed.
      Note:

    The instructions that follow assume you have installed a Cloudera Manager license. If you are not yet ready to install a Cloudera Manager license, and want to proceed with a Free Edition installation, stop here and use the Cloudera Manager Free Edition Installation Guide instead. If you install the Free Edition, and later need to upgrade to the full version of Cloudera Manager, follow the instructions under Upgrading from Cloudera Manager Free Edition 4.5 to the Cloudera Manager Enterprise Edition.

  3. After you install the Cloudera Manager license, restart the Cloudera Manager server. On Red Hat/CentOS/SUSE systems:
    $ sudo service cloudera-scm-server restart

    On Debian/Ubuntu systems:

    $ sudo service cloudera-scm-server restart
  4. After the Cloudera Manager server restarts, use your web browser to connect to the Cloudera Manager Admin Console URL again and log in, as described in Step 2.
      Note:

    After restarting the server, wait a few seconds for the server to finish initializing before you try to reconnect to the Admin Console.

  5. Information is displayed indicating what the CDH installation includes. Click Continue.
  6. To enable Cloudera Manager to automatically discover your cluster hosts where you want to install CDH, enter the cluster hostnames or IP addresses. You can also specify hostname and IP address ranges: For example:

    Use this Expansion Range

    To Specify these Hosts

    10.1.1.[1-4]

    10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4

    host[1-3].company.com

    host1.company.com, host2.company.com, host3.company.com

    host[07-10].company.com

    host07.company.com, host08.company.com, host09.company.com, host10.company.com

    You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of searching overly wide ranges. The scan results will include all addresses scanned, but only scans that reach hosts running SSH will be selected for inclusion in your cluster by default.

      Note:

    If you don't know the IP addresses of all of the hosts, you can enter an address range that spans over unused addresses and then deselect the hosts that do not exist (and are not discovered) later in this procedure. However, keep in mind that wider ranges will require more time to scan.

  7. Click Search. Cloudera Manager identifies the hosts on your cluster to allow you to configure them for CDH. If there are a large number of hosts on your cluster, wait a few moments to allow them to be discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan. To find additional hosts, add their host name or IP address and click Search again.
      Note:

    Cloudera Manager scans hosts by checking for network connectivity. If there are some hosts where you want to install CDH that are not shown in the list, make sure you have network connectivity between the Cloudera Manager Server host and those hosts. Common causes of loss of connectivity are firewalls and interference from SELinux.

  8. Verify that the number of hosts shown matches the number of hosts where you want to install CDH. Deselect host entries that do not exist and deselect the hosts where you do not want to install CDH. Click Install CDH On Selected Hosts.

    Click Continue

  9. Select the repository type you want to use for the installation.

    Installing from parcels is recommended, if they are available for the version you want to install.

      Note:

    Parcels are available for CDH4.1.2 or later, and for Impala. To install CDH3 or to install an earlier version of CDH4, select Packages.

Installation using Parcels

  1. Choose the parcel you want to install. The choices you see depend on the repositories you have chosen – a repository may contain multiple parcels.

    If you have parcels in a custom repository, you can specify the repository and Cloudera Manager will add those parcels to the list shown on this page.

    1. Click More Options to show the custom repository field.
    2. Enter the URL of the repository you want into the field provided, and click the + Add button. The URL you specify here will also be added to the list of remote repositories referenced in the Remote Parcel Repository URLs property. If you have multiple repositories configured, you will see all the unique parcels contained in all your repositories.
  2. Select the specific release of Cloudera Manager to install on your hosts. You may choose either the version that matches with the Cloudera Manager Server you are currently using, or you can specify an installation from a custom repository.

  3. If available, select the specific release of Impala to install on your hosts. You may choose either the latest version or use a custom repository. If you do not want to install Impala, select None.

  4. If you opted to use custom repositories for installation files, you may provide a GPG key URL that will apply for all repositories.

  5. Click Continue. You are now asked to provide your credentials, following the instructions at Provide credentials for authenticating with hosts.

Installation using Packages

  1. Choose the CDH version to install.
  2. Select the major release of CDH to install. This is often CDH4.
  3. Select the specific release of CDH to install from within the major version you selected. You may choose a custom repository.
  4. Select the specific release of Impala to install on your hosts. You may choose either the latest version or use a custom repository.
  5. Select the specific release of Cloudera Manager to install on your hosts. You may choose either the version that matches with the Cloudera Manager Server you are currently using or you can specify an installation at a custom repository.
  6. If you opted to use custom repositories for installation files, you may provide a GPG key URL that will apply for all repositories.
  7. Click Continue.

Provide credentials for authenticating with hosts

  1. Select root or enter the user name for an account that has password-less sudo permissions.
  2. Select an authentication method.
    • If you choose to use password authentication, enter and confirm the password.
    • If you choose to use public-key authentication provide a passphrase and path to the required key files.
    • You can choose to specify an alternate SSH port. The default value is 22.
    • You can specify the maximum number of host installations to run at once. The default value is 10.
  3. Click Continue to begin installing the Cloudera Manager Agent and Daemons on the cluster hosts. If you are installing from packages, the process also installs CDH (and Impala, if you've selected it) on your hosts.

Install Cloudera Manager and CDH components

The status of installation on each host is displayed in the following screen. The Cloudera Manager wizard uses SSH to access the cluster hosts and follows a sequence of steps to download and install the Oracle JDK, Cloudera Manager Agents and Daemons. If you are installing from packages rather than parcels, CDH is also installed at this step. .

  Note:

Clicking Abort Installation while installation is in progress halts any pending or in-progress installations and rolls back any in-progress installations to a clean state. Clicking Abort Installation does not affect completed or failed host installations.

If installation fails on a host, you can click the Uninstall link next to the failed host. This will give you the choice of uninstalling the failed hosts, or to try installation on that host again. To uninstall, click Uninstall Failed Hosts. To retry installation on all failed hosts, click Retry Failed Hosts.

To avoid excessive network load, the wizard runs a limited number of installations in parallel, based on the value indicated on the page where you provided your authentication credentials. The default is 10 simultaneous installations.

  1. If you are installing from packages, the wizard configures package repositories, installs the Oracle JDK, CDH, and the Cloudera Manager Agent, and then starts the Cloudera Manager Agent. The status of installation on each host is displayed. You can also click the Details link for individual hosts to view detailed information about the installation and error messages if installation fails on any hosts.
    1. When the Continue button appears at the bottom of the screen, the installation process is completed. If the installation has completed successfully on some hosts but failed on others, you can click Continue if you want to skip installation on the failed hosts and continue to the next screen to start configuring CDH on the successful hosts.
  2. If you are installing from parcels, the wizard installs the Oracle JDK and the Cloudera Manager Agent using packages, as described above. The status of installation on each host is displayed.
    1. When the Cloudera Manager Agent, the JDK etc. have been installed, click Continue to proceed to the cluster installation section. During the parcel installation, progress is indicated for the three phases of the parcel installation process (Download, Distribution, and Activation) in a single progress bar. If you are installing multiple parcels (e.g. CDH and Impala) you will see a progress bar for each parcel.
    2. When the Continue button appears at the bottom of the screen, the installation process is completed.
  3. When you continue, the Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Continue.

Choose the services you want to start on your cluster

  1. Choose which version of CDH to use.
  2. Choose the combination of services to install: Core Hadoop, Real-Time Delivery (previously known as HBase Services), Real-Time Query (which includes HDFS, Hive and Impala), All Services, or Custom Services.
      Note:
    • Some services depend on others; for example, HBase requires HDFS and ZooKeeper.
    • Most of the combinations install MapReduce v1. Choose the Custom Services option to install MapReduce v2 (YARN) or use the Add Service functionality to add YARN after installation completes.
  3. Choose whether to install Cloudera Navigator. Cloudera Navigator is independently licensed from the core Cloudera Enterprise offering.
  4. Click Inspect Role Assignments to see how the wizard will assign roles for the services you have chosen, and change them if you need to. These assignments are typically acceptable, but you can reassign services to nodes of your choosing, if desired. The wizard evaluates the hardware configurations of the cluster hosts to determine the best machines for each role. For example, the wizard assigns the NameNode role to the machine that best meets the NameNode requirements. The wizard also configures other options, such as the number of map and reduce slots for TaskTracker, on the basis of the size of the cluster and the physical characteristics of each machines, such as the number of CPUs, amount of RAM, and disk space. These assignments are typically acceptable, but you can reassign services to nodes of your choosing, if desired.
  5. Click Continue when you are satisfied with the assignments.
  6. On the Database Setup page, configure settings for the Activity Monitor, Service Monitor, Report Manager, Host Monitor, and Hive metastore databases.
    • Leave the default settings of Use Embedded Database to have Cloudera Manager create and configure all required databases.
    • Select Custom to specify external databases, and enter the required information for the databases that you created when you set up your databases for Cloudera Manager. You must provide the Database host, database type, database name, username, and password.
    • Click Test Connection to confirm that Cloudera Manager can communicate with the databases using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise check and correct the information you have provided for the databases and then try the test again. (Note that for Hive, if you are using the embedded database, you may see a message saying the connection will be created at a later point in the installation process.)
  7. Review the Configuration Changes to be applied.

    Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. For example, you might confirm the NameNode Data Directory and the DataNode Data Directory for HDFS or confirm the TaskTracker Local Data Directory List or JobTracker Local Data Directory for MapReduce.

  8. Click Continue. The wizard starts the services on your cluster.
  9. When all of the services are started, click Continue. You will see a success message indicating that your cluster has been successfully started.
  10. Click Continue to proceed to the Cloudera Manager Services page.

Step 4: Change the Default Administrator Password

As soon as possible after running the wizard and beginning to use Cloudera Manager, you should change the default administrator password.

To change the administrator password:

  1. Click the gear icon images/image2.jpeg to display the Administration page.
  2. Click the Users tab.
  3. Click the Change Password button next to the admin account.
  4. Enter a new password twice and then click Submit.

Step 5: Test the Installation

Now that you have finished with the CDH and Cloudera Manager installation, you are ready to test the installation. For testing instructions, see Testing the Installation.

  Note:

If you change the hostname or port where the Cloudera Manager is running, or you enable TLS security, you must restart the Cloudera Management Services to update the URL to the Server. For instructions, see Restarting a Service.