CDH4 Cluster Installation Guide is for Hadoop developers and system administrators interested in Hadoop cluster installation. The following sections describe how to install and configure version 4 of Cloudera’s.
This document helps you in configuring the multi node Hadoop Cluster. It also helps you in configuration of cluster for better performance. It also contains the best practices which will help you in the HBase and MapReduce configuration.
1) Summary
2) prerequisites
A. Supported Operating System
B. Software
C. Unique Host name
3) Introduction to Cloudera Manager Installation
4) Preparation for installation
A. Networking
B. Firewall and Security
I. Configure the SSH
II. Check SSH_config file setting
III. IPTABLE turn to off
C. Disable SELINUX
D. Proxy Setting
1.1.1 In terminal
E. Host Entry
F. Update OS Setup
G. Download Cloudera manger
5) Setting up the cloudera manger
Step 1: Registration Document
Step 2: Specify Host for installation
Step 3: Connecting Specified hosts With SSH
Step 4: Choose CDH Version
Step 5: Provide SSH Login Credentials
Figure 5: SSH/Login CredentialsStep 6: Installation Done
Figure 6: Installation on nodes
Step 7: Inspect hosts for correctness
Figure 7: Check hosts for correctness.
Step 8: Choose services to install on the cluster
Figure 8: Choose services to Install.
Step 9: Inspect Role Assignments
Figure 9: Role Assignments.
Step 10: Review the Configuration
Figure 10: Configuration for your Cluster
Step 11: Change the Default Administrator Password
Step 12: Test the Installation
Figure 11: All Services
This document helps you in configuring the multi node Hadoop Cluster. It also helps you in configuration of cluster for better performance. It also contains the best practices which will help you in the HBase and MapReduce configuration.
1) Summary
CDH4 Cluster Installation Guide is for Hadoop developers and system administrators interested in Hadoop cluster installation. The following sections describe how to install and configure version 4 of Cloudera’s.
This document helps you in configuring the multi node Hadoop Cluster. It also helps you in configuration of cluster for better performance. It also contains the best practices which will help you in the HBase and MapReduce configuration.
2) prerequisites
A. Supported Operating System
CDH4 supports the following operating systems:
- Red Hat-compatible systems
· Red Hat Enterprise Linux 5.7 and CentOS 5.7
· Red Hat Enterprise Linux 6.2 and CentOS 6.2
· Oracle Enterprise Linux 5.6 with Unbreakable Enterprise Kernel
- SLES systems
· SUSE Linux Enterprise Server 11. Service Pack 1 or later is required.
- Debian systems
· Debian 6.0 (Squeeze)
- Ubuntu systems
· Ubuntu 10.04
· Ubuntu 12.04
Cloudera manager only support 64 bit operation System.
B. Software
o Perl
o SSH
o Open ssh –server
o Open ssh –clients
o Cloudera manager
C. Unique Host name
o Host name should be unique in your network ex. Node1.example.com
3) Introduction to Cloudera Manager Installation
Cloudera Manager automates the installation and configuration of CDH on an entire cluster, requiring only that you have root SSH access to your cluster's machines, and access to the internet or a local repository with installation files for all these machines.
It Consist of following
· Cloudera Manager Server
· Cloudera Manager agent
· PostgreSQL database
· About cloudera manager, how it works?
o Using SSH, discover the cluster hosts you specify via IP address ranges or hostnames
o Configure the package repositories for Cloudera Manager, CDH, and the Oracle JDK
o Install the Cloudera Manager Agent and CDH (including Hue) on the cluster hosts
o Install the Oracle JDK if it's not already installed on the cluster hosts
o Determine mapping of services to host
o Suggest a Hadoop configuration and start the Hadoop services
You can also choose to add node and remove node from the cluster.
4) Preparation for installation
A. Networking
o Check internet setting on every host
o Cluster hosts must have Same DNS and reverse DNS properly configured
o Check out the hostname it should with standard ex. Node1.example.com
B. Firewall and Security
o The Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation wizard.
Note
You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you will need to either enter the password or upload a public and private key pair for the root or sudo user account.
Cloudera Manager uses SSH only during the initial install or upgrade. Once your cluster is set up, you can safely disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials and all credential information is discarded once the installation is complete.
I. Configure the SSH
Steps:
a) ssh-keygen
b) cd /root/.ssh/
c) ls
d) cp idrsa.pub authorised_keys
e) Cat the all idsa.pub and authorised_keys and store in the same location so that every machine can SSH to other.
II. Check SSH_config file setting
When using multiple systems the indispensable tool is, as we all know, ssh. Using ssh you can login to other (remote) systems and work with them as if you were sitting in front of them. Even if some of your systems exist behind firewalls you can still get to them with ssh, but getting there can end up requiring a number of command line options and the more systems you have the more difficult it gets to remember them. However, you don't have to remember them, at least not more than once: you can just enter them into ssh's config file and be done with it.
Steps
1) vi /etc/ssh/ssh_config
Change ask to no i.e. StrictHostkeychecking no
III. IPTABLE turn to off
Iptables is administration tool / command for IPv4 packet filtering and NAT. You need to use the following tools:
[a] service is a command to run a System V init script. It is use to save / stop / start firewall service.
[b] Chkconfig command is used to update and queries run level information for system service. It is a system tool for maintaining the /etc/rc*.d hierarchy. Use this tool to disable firewall service at boot time.
Steps
a) Service iptables save
b) service iptables stop
c) chkconfig iptables off
d) /etc/init.d/network restart
C. Disable SELINUX
Steps for disable the SELINUX
o vi /etc/selinux/config
SELINUX= disabled
D. Proxy Setting
Check Use this proxy1 server for all protocols
1.1.1 In terminal
sudo gedit /etc/yum.conf
# The proxy server - proxy server:port number
proxy=http://proxy1.xx.com:8080
# the account details for yum connections
proxy_username=<username>
proxy_password=<your password>
Then save the file
sudo "yum clean all"
E. Host Entry
This snippet describes the format of the /etc/hosts file. This file is a simple text file that associates IP addresses with hostnames, one line per IP address. You should have some subset of all hostnames in /etc/hosts. You should have some sort of name resolution, even when no network interfaces are running, for example, during boot time. This is not only a matter of convenience, but it allows you to use symbolic hostnames in your network RC scripts. Thus, when changing IP addresses, you only have to copy an updated hosts file to all machines and reboot, rather than edit a large number of RC files separately. Usually you put all local hostnames and addresses in hosts, adding those of any gateways and NIS servers used. For each host a single line should be present with the following information:
sudo gedit /etc/hosts
127.0.0.1 localhost localhost
10.xxx.xxx.xxx master.example.com master
10.xxx.xxx.xxx node01.example.com node01
10.xxx.xxx.xxx node02.example.com node02
10.xxx.xxx.xxx node03.example.com node03
10.xxx.xxx.xxx master.example.com master
10.xxx.xxx.xxx node01.example.com node01
10.xxx.xxx.xxx node02.example.com node02
10.xxx.xxx.xxx node03.example.com node03
F. Update OS Setup
Updating the OS not necessary but it is better to use the latest stable version.
Check for the fast mirror for update
vi /etc/yum/pluginconf.d/fastestmirror.conf
change the “Enabled=1”
yum -y update
G. Download Cloudera manger
If you have curl installed then use
or use click on the link
IF the link is not working the go to
download the latest stable free version
Change the mode for the execution
Chmod +x cloudera-manager-installer.bin
Sudo ./cloudera-manager-installer.bin
Accept the agreements and click ok
After some time cloudera manager will be install
http://master.example.com:7180
To start the Cloudera Manager Admin Console:
1. In a web browser, enter the URL, including the port, for the Cloudera Server. The login screen for Cloudera Manager appears.
2. Log into Cloudera Manager. The default credentials are:
Username: admin
Password: admin
5) Setting up the cloudera manger
Follow the following steps
Step 1: Registration Document
· You can register on cloudera and click on Submit Registration.
· Just click on Proceed.
Figure 1:- Registration Document
Step 2: Specify Host for installation
To enable Cloudera Manager to automatically discover your cluster hosts where you want to install CDH, enter the cluster hostnames or IP addresses and click Search. You can also specify hostname and IP address ranges:
Use This Expansion Range | To Specify These Hosts |
10.1.1.[1-4] | 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 |
host[1-3].network.com | host1.network.com, host2.network.com, host3.network.com |
host[07-10].network.com | host07.network.com, host08.network.com, host09.network.com, host10.network.com |
You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of searching overly wide ranges.
The scan results will include all addresses scanned, but only scans that reach hosts running SSH will be selected for inclusion in your cluster by default.
Note:If you don't know the IP addresses of all of the hosts, you can enter an address range that spans over unused addresses. Note that larger ranges will require more time to scan.
You can abort any actively running scan by clicking Abort Scan. To find additional hosts after scanning completes, add or modify the hostname or IP address ranges and click Search again.
Figure 2: Specify Host for Installation
Step 3: Connecting Specified hosts With SSH
This step is only for the searching the specified hosts it is reachable or not. SSH on the basis of starting node. If the node is not reachable from that node then it shows not reachable. If SSH is not running then it shows that “Could not connect to host”. It also shows the response time to respond while connecting to the particular node.
Figure 3: Connecting Specified hosts With SSH
Step 4: Choose CDH Version
This screen shot showing the available version of CDH and its versions. It also support offline installation with custom repository.
Figure 4: Choose CDH Version
Step 5: Provide SSH Login Credentials
To authenticate with the hosts, you must either use a root account that is on all of your cluster hosts, or use an account that has password-less sudo permissions. Select root or enter the user name for an account that has password-less sudo permissions. You can either use a shared password for the account, or use a public and private key pair.
To enter a password, click all hosts accept same password and enter the account password. To use a public and private key pair, click all hosts accept same public key. Specify or browse for the location of the public and private keys. If your keys contain a passphrase, enter it.
Step 6: Installation Done
The wizard runs a maximum of 10 installations in parallel to avoid excessive network load. The status of installation on each host is displayed on the page that appears after you click Start Installation. You can also click the Detailslink for individual hosts to view detailed information about the installation and error messages if installation fails on any hosts.
If you click the Abort Installation button while installation is in progress, it will halt any pending or in-progress installations and roll back any in-progress installations to a clean state. The Abort Installation button does not affect host installations that have already completed successfully or already failed.
If installation fails on a host, you can click the Retry link next to the failed host to try installation on that host again. To retry installation on all failed hosts, click Retry Failed Hosts at the bottom of the screen.
When the Continue button appears at the bottom of the screen, the installation process is complete.
If the installation has completed successfully on some hosts but failed on others, you can click Continue if you want to skip installation on the failed hosts and continue to the next screen to start installing the Cloudera Management services on the successful hosts.
Step 7: Inspect hosts for correctness
When you continue, the Host Inspector runs to validate the installation, and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Continue.
Error 1: clock is not synchronized. – Synchronized clock with the cloudera manager server node.
Error 2: /etc/hosts file error- error in the host file.
Error 3: Localhost error – Localhost line is not added in the /etc/hosts file.
Step 8: Choose services to install on the cluster
Choose the services you want to start on your cluster.
· Choose which version of CDH to use.
· Choose the combination of services to install: Core Hadoop, HBase Services, All Services, or Custom Services.
Some services depend on others; for example, HBase requires HDFS and Zookeeper. Most of the combinations install MapReduce v1. Choose the custom option to install MapReduce v2 (YARN) or use the Add Service functionality to add YARN after installation completes.
Step 9: Inspect Role Assignments
Click Inspect Role Assignments to see how the wizard will assign roles for the services you have chosen, and change them if you need to. These assignments are typically acceptable, but you can reassign services to nodes of your choosing, if desired. The wizard evaluates the hardware configurations of the cluster hosts to determine the best machines for each role. For example, the wizard assigns the NameNode role to the machine that best meets the NameNode requirements. The wizard also configures other options, such as the number of map and reduces slots for TaskTracker, on the basis of the size of the cluster and the physical characteristics of each machine, such as the number of CPUs, amount of RAM, and disk space.
Click Continue when you are satisfied with the assignments.
Step 10: Review the Configuration
Review Configuration Changes to be applied.
· Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. For example, you might confirm the NameNode Data Directory and the DataNode Data Directory for HDFS or confirm the TaskTracker Local Data Directory List or JobTracker Local Data Directory for MapReduce.
· Click on continue
· The wizard starts the services on your cluster.
· When all of the services are started, click Continue.
· Click Continue.
Step 11: Change the Default Administrator Password
· As soon as possible after running the wizard and beginning to use Cloudera Manager, you should change the default administrator password.
· To change the administrator password:
·
· Click the gear icon to display the Administration page (Right TOP Corner).
· Click the Userstab.
· Click the Change Password button next to the admin account.
· Enter a new password twice and then click Submit.
Step 12: Test the Installation
·