SCENARIO
In the cluster we have 5 machines as follows
# | SERVERNAME | HADOOP COMPONENT | SHORT NAME |
1 | jt.mydomain.com | Jobtracker | jt |
2 | nn.mydomain.com | Namenode | nn |
3 | sn.mydomain.com | Secondary Namenode | sn |
4 | tt1.mydomain.com | Tasktracker1 | tt1 |
5 | tt2.mydomain.com | Tasktracker2 | tt2 |
STEPS
1. Download Hadoop and Java to all machines
See the INSTALL section on http://hadoop-blog.blogspot.com/2010/11/how-to-install-standalone-hadoop-for.html to get more details. For the purpose of this document we will assume that hadoop is installed in directory /home/${USER}/hadoop-0.20.2 and java is installed in directory /home/${USER}/jdk1.6.0_22
2. Ensure that machines in the cluster can see each other
Setup password less ssh between following machines
Setup password less ssh between following machines
- jt to nn
- jt to sn
- jt to tt1
- jt to tt2
- nn to jt
- nn to sn
- nn to tt1
- nn to tt2
- sn to nn
- sn to jt
3. Set up the Namenode
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/core-site.xml with following lines
<?xml version="1.0"?>
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/mapred-site.xml with following lines
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>jt.mydomain.com:9001</value>
</property>
</configuration>
8. Format Namenode
Format the namenode by running the following command on nn.mydomain.com
/home/${USER}/hadoop-0.20.2/bin/hadoop namenode -format
9. Start DFS
On nn.mydomain.com command prompt, run the following command to start HDFS daemon on Name node and data nodes and will also setup secondard name node
sh /home/${USER}/hadoop-0.20.2/bin/start-dfs.sh
10. Start MapReduce
On jt.mydomain.com command prompt, run the following command to start MapReduce daemon on Jobtracker and tasktrackers.
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/core-site.xml with following lines
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://nn.mydomain.com/</value>
</property>
</configuration>
4. Set up path to Java, Master and Slave directories/files
In Hadoop, JobTracker and Namenode are called Masters and tasktracker and datanodes are called slaves. Every slave runs both Datanode and Tasktracker.
On jt.mydomain.com, add following 3 lines to file /home/${USER}/hadoop-0.20.2/conf/hadoop-env.sh
export JAVA_HOME=/home/${USER}/jdk1.6.0_22In Hadoop, JobTracker and Namenode are called Masters and tasktracker and datanodes are called slaves. Every slave runs both Datanode and Tasktracker.
On jt.mydomain.com, add following 3 lines to file /home/${USER}/hadoop-0.20.2/conf/hadoop-env.sh
export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
export HADOOP_MASTER=jt.mydomain.com:/home/${USER}/hadoop-0.20.2
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/mapred-site.xml with following lines
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>jt.mydomain.com:9001</value>
</property>
</configuration>
6. List Masters
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/masters with following line
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/masters with following line
sn.mydomain.com
7. List Slaves
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/slaves with following lines
On jt.mydomain.com, overwrite file /home/${USER}/hadoop-0.20.2/conf/slaves with following lines
t1.mydomain.com
t2.mydomain.com
8. Format Namenode
Format the namenode by running the following command on nn.mydomain.com
/home/${USER}/hadoop-0.20.2/bin/hadoop namenode -format
9. Start DFS
On nn.mydomain.com command prompt, run the following command to start HDFS daemon on Name node and data nodes and will also setup secondard name node
sh /home/${USER}/hadoop-0.20.2/bin/start-dfs.sh
10. Start MapReduce
On jt.mydomain.com command prompt, run the following command to start MapReduce daemon on Jobtracker and tasktrackers.
sh /home/${USER}/hadoop-0.20.2/bin/start-mapred.sh
That's it! Hadoop cluster is up and running now.
NOTES
- The cluster is defined in files slaves and masters
- You can use IP addresses instead of host names, i.e. 10.2.3.4 instead of jt.mydomain.com
- After you execute #8 and #9, you will notice that all files that were updated in steps #3-7 on jt.mydomain.com are also updated on nn, sn, tt1 and tt2 machines. This happened because we set property HADOOP_MASTER in step #4 to jt.mydomain.com. Setting this property means that use the config files on jt.mydomain.com as master files and sync them across all nodes that you find in the cluster.
- Jobtracker WebUI should be up and running on http://jt.mydomain.com:50030/jobtracker.jsp
- HDFS WebUI should be up and running on http://nn.mydomain.com:50070/dfshealth.jsp
I also found another excellent tutorial, better than my blog post for sure.
No comments:
Post a Comment