In this lesson, we will discuss Setting up Hadoop Multi-Node Cluster.It is a part of big data and hadoop course training,which is offered by OnlineITguru.
Installing Java
Syntax of java version command
Syntax of java version command
$ java -version
Following output is presented.
java version "1.7.0_71" Java(TM) SE Runtime Environment (build 1.7.0_71-b13) Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)
Creating User Account
System user account on both master and slave systems should be created to use the Hadoop installation.
System user account on both master and slave systems should be created to use the Hadoop installation.
# useradd hadoop # passwd hadoop
Mapping the nodes
hosts file should be edited in /etc/ folder on all nodes and IP address of each system followed by their host names must be specified.
# vi /etc/hosts
Enter the following lines in the /etc/hosts file.
192.168.1.109 hadoop-master 192.168.1.145 hadoop-slave-1 192.168.56.1 hadoop-slave-2
To learn Big data and hadoop course visit:big data and hadoop online training
Configuring Key Based Login
Ssh should be setup in each node such that they can converse with one another without any prompt for password.
Ssh should be setup in each node such that they can converse with one another without any prompt for password.
# su hadoop $ ssh-keygen -t rsa $ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1 $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2 $ chmod 0600 ~/.ssh/authorized_keys $ exit Installing Hadoop Hadoop should be downloaded in the master server.
# mkdir /opt/hadoop # cd /opt/hadoop/ # wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.0.tar.gz # tar -xzf hadoop-1.2.0.tar.gz # mv hadoop-1.2.0 hadoop # chown -R hadoop /opt/hadoop # cd /opt/hadoop/hadoop/
Configuring Hadoop
Hadoop server must be configured
core-site.xml should be edited.
fs.default.namehdfs://hadoop-master:9000/ dfs.permissions false
hdfs-site.xml file should be editted.
dfs.data.dir /opt/hadoop/hadoop/dfs/name/data true dfs.name.dir /opt/hadoop/hadoop/dfs/name true dfs.replication 1
mapred-site.xml file should be editted.
mapred-site.xml file should be editted.
mapred.job.trackerhadoop-master:9001
JAVA_HOME, HADOOP_CONF_DIR, and HADOOP_OPTS should be edited.
export JAVA_HOME=/opt/jdk1.7.0_17 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
Installing Hadoop on Slave Servers
Hadoop should be installed on all the slave servers
# su hadoop $ cd /opt/hadoop $ scp -r hadoop hadoop-slave-1:/opt/hadoop $ scp -r hadoop hadoop-slave-2:/opt/hadoop
Configuring Hadoop on Master Server
Master server should be configured
# su hadoop $ cd /opt/hadoop/hadoop
Master Node Configuration
$ vi etc/hadoop/masters hadoop-master
Slave Node Configuration
$ vi etc/hadoop/slaves hadoop-slave-1 hadoop-slave-2
Name Node format on Hadoop Master
# su hadoop $ cd /opt/hadoop/hadoop $ bin/hadoop namenode –format
11/10/14 10:58:07 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop-master/192.168.1.109 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013 STARTUP_MSG: java = 1.7.0_71 ************************************************************/ 11/10/14 10:58:08 INFO util.GSet: Computing capacity for map BlocksMap editlog=/opt/hadoop/hadoop/dfs/name/current/edits …………………………………………………. …………………………………………………. …………………………………………………. 11/10/14 10:58:08 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted. 11/10/14 10:58:08 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/192.168.1.15 ************************************************************/
Hadoop Services
Starting Hadoop services on the Hadoop-Master.
$ cd $HADOOP_HOME/sbin $ start-all.sh
+
To learn complete tutorial visit our blog,big data hadoop course
Addition of a New DataNode in the Hadoop Cluster
Networking
Add new nodes to an existing Hadoop cluster with some suitable network configuration. suppose the following network configuration.
For New node Configuration:
Networking
Add new nodes to an existing Hadoop cluster with some suitable network configuration. suppose the following network configuration.
For New node Configuration:
IP address : 192.168.1.103 netmask : 255.255.255.0 hostname : slave3.in
Adding a User and SSH Access
Add a User
“hadoop” user must be added and password of Hadoop user can be set to anything one wants.
useradd hadoop passwd hadoop
To be executed on master
mkdir -p $HOME/.ssh chmod 700 $HOME/.ssh ssh-keygen -t rsa -P '' -f $HOME/.ssh/id_rsa cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys chmod 644 $HOME/.ssh/authorized_keys Copy the public key to new slave node in hadoop user $HOME directory scp $HOME/.ssh/id_rsa.pub hadoop@192.168.1.103:/home/hadoop/
To be executed on slaves
su hadoop ssh -X hadoop@192.168.1.103
Content of public key must be copied into file “$HOME/.ssh/authorized_keys” and then the permission for the same must be changed.
cd $HOME mkdir -p $HOME/.ssh chmod 700 $HOME/.ssh cat id_rsa.pub >>$HOME/.ssh/authorized_keys chmod 644 $HOME/.ssh/authorized_keys
ssh login must be changed from the master machine. Possibility of ssh to the new node without a password from the master must be verified.
ssh hadoop@192.168.1.103 or hadoop@slave3
Set Hostname of New Node
Hostname is set in file /etc/sysconfig/network
On new slave3 machine NETWORKING=yes HOSTNAME=slave3.in
Machine must be restarted or hostname command should be run to a new machine with the respective hostname to make changes effective.
On slave3 node machine:
hostname slave3.in
/etc/hosts must be updated on all machines of the cluster
On slave3 node machine:
hostname slave3.in
/etc/hosts must be updated on all machines of the cluster
192.168.1.102 slave3.in slave3
ping the machine with hostnames to check whether it is resolving to IP.
ping master.in
Start the DataNode on New Node
Datanode daemon should be started manually using $HADOOP_HOME/bin/hadoop-daemon.sh script. Master(NameNode) should join the cluster after being automatically contacted. New node should be added to the conf/slaves file in the master server. New node will be recognized by script-based commands.
Login to new node
Login to new node
su hadoop or ssh -X hadoop@192.168.1.103
HDFS is started on a newly added slave node
./bin/hadoop-daemon.sh start datanode
jps command output must be checked on a new node.
$ jps 7141 DataNode 10312 Jps
Removing a DataNodeNode can be removed from a cluster as it is running, without any data loss. A decommissioning feature is made available by HDFS which ensures that removing a node is performed securely.
Step 1Login to master machine user where Hadoop is installed.
$ su hadoop
Step 2Before starting the cluster an exclude file must be configured. A key named dfs.hosts.exclude should be added to our$HADOOP_HOME/etc/hadoop/hdfs-site.xmlfile.
NameNode’s local file system which contains a list of machines which are not permitted to connect to HDFS receives full path by this key and the value associated with it.
dfs.hosts.exclude/home/hadoop/hadoop-1.2.1/hdfs_exclude.txt>DFS exclude
Step 3Hosts to decommission are determined.
Additions should be made to file recognized by the hdfs_exclude.txt for every machine to be decommissioned which will prevent them from connecting to the NameNode.
slave2.in
Step 4Force configuration reload.
“$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes” should be run
$ $HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes
NameNode will be forced to re-read its configuration, this is inclusive of the newly updated ‘excludes’ file. Nodes will be decommissioned over a period of time, allowing time for each node’s blocks to be replicated onto machines which are scheduled to remain active.
jps command output should be checked on slave2.in. DataNode process will shutdown automatically.
Step 5Shutdown nodes.
The decommissioned hardware can be carefully shut down for maintenance after the decommission process has been finished.
$ $HADOOP_HOME/bin/hadoop dfsadmin -report
Step 6Excludes are edited again and once the machines have been decommissioned, they can be removed from the ‘excludes’ file. “$HADOOP_HOME/bin/hadoop dfsadmin -refreshNodes” will read the excludes file back into the NameNode;DataNodes will rejoin the cluster after the maintenance has been completed, or if additional capacity is needed in the cluster again.
To run/shutdown tasktracker
$ $HADOOP_HOME/bin/hadoop-daemon.sh stop tasktracker $ $HADOOP_HOME/bin/hadoop-daemon.sh start tasktracker
No comments:
Post a Comment