The following are the steps involved in setting up the standalone installation of Hadoop 0.20 on Java 5. Please note that setting up a Hadoop cluster is very different than setting up a standalone version.
INSTALL
- Get a linux machine
- Suppose your username is jeka and home directory is /home/jeka or ~
- Create directories ~/java and ~/hadoop
- Download required software if required
- Download java release (file jdk-6u22-linux-i586.bin) from here to directory ~/java
- Download Hadoop release(file hadoop-0.20.2.tar.gz) from here to directory~/hadoop
- Install Java
- Make java file executable
chmod a+x java/jdk-6u22-linux-i586.bin
- Install it
~/java/jdk-6u22-linux-i586.bin
- Install Hadoop
- Unzip
gunzip ~/hadoop/hadoop-0.20.2.tar.gz
- Untar
tar -xvf ~/hadoop/hadoop-0.20.2.tar.gz
Thats it!. The installation is done and hadoop is ready to be used but to make life a little easier we should set up some environment variables.
CONFIGURE
Both Java and Hadoop provides command line clients (or executables) java and hadoop respectively. These executables can found in the bin directory of the installation.
- Create a file ~/.hadoop_profile and add following lines in it
export JAVA_HOME="~/java/jdk1.6.0_22"
export HADOOP_HOME="~/hadoop/hadoop-0.20.2"
export PATH=${PATH}:${JAVA_HOME}:${HADOOP_HOME}
Save this file and source it
source ~/.hadoop_profile
Now instead of running hadoop like ~/hadoop/hadoop-0.20.2/bin/hadoop you can simple use it as hadoop.
RUN YOUR FIRST HADOOP JOB
Note: This job will run on your local machine and not HDFS
File ~/hadoop/hadoop-0.20.2/hadoop-0.20.2-examples.jar comes with some examples. We can use one of the examples "grep" from that.
In the following example, we will use one of the map reduce examples to read the number of times the work "copyright" appeared in file LICENSE.txt.
cd ~/hadoop/hadoop-0.20.2
hadoop jar hadoop-0.20.2-examples.jar grep LICENSE.txt ~/tmp/out "copyright"
Output: 4
cat ~/tmp/out/*
It's very simple to create your own jar and run it instead of using the examples jar. See blog post
http://hadoop-blog.blogspot.com/2010/11/how-to-run-and-compile-hadoop-program.html for more details
cat ~/tmp/out/*
It's very simple to create your own jar and run it instead of using the examples jar. See blog post
http://hadoop-blog.blogspot.com/2010/11/how-to-run-and-compile-hadoop-program.html for more details
No comments:
Post a Comment