Tuesday, November 16, 2010

How to use Hadoop Streaming

RUN HADOOP STREAMING

hadoop jar ${HADOOP_STREAMING} -input test/a' -output 'testout' -mapper '/bin/cat' -reducer '/usr/bin/wc -l '


RUN HADOOP STREAMING WITH NO REDUCE TASK

hadoop fs -rmr testout;hadoop jar ${HADOOP_STREAMING} -input 'testin' -output 'testout' -mapper '/bin/cat -n' -reducer '' -jobconf mapred.reduce.tasks=0


HOW TO RUN A UNIX COMMAND USING COMMAND SUBSTITUTION IN STREAMING

mawk="awk '{print NR,\$0}'"; hadoop fs -rmr testout;hadoop jar ${HADOOP_STREAMING} -input 'testin' -output 'testout' -mapper '/bin/cat -n' -reducer "${mawk}"

1 comment:

  1. I like your post and thought may be you can help me understand this - Whenever I am
    trying to use Java class files as my mapper and/or reducer I am getting
    the following error:

    java.io.IOException: Cannot run program "MapperTst.class":
    java.io.IOException: error=2, No such file or directory

    I executed the following command on the terminal:

    hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar
    contrib/streaming/hadoop-streaming-0.20.203.0.jar -file
    /home/hadoop/codes/MapperTst.class -mapper
    /home/hadoop/codes/MapperTst.class -file
    /home/hadoop/codes/ReducerTst.class -reducer
    /home/hadoop/codes/ReducerTst.class -input gutenberg/* -output
    gutenberg-outputtstch27

    Please let me if I am going wrong.

    Thanks in advance.

    Regards

    Shrish

    ReplyDelete