Wednesday, December 8, 2010

How configure Secondary namenode on a separate machine

If you have installed cloudera's hadoop distribution (CDH2) then you must have noticed that running command start-dfs.sh starts an instance of SecondaryNameNode process on all the datanodes. This is happening due to the way SecondaryNameNode startup is defined in file bin/start-dfs.sh. 


Scenario 1 : If you want to run your SecondaryNameNode on some other server (say sn.jeka.com) instead of the datanodes then do the following 

1. Logon to JobTracker (I am going to JobTracker because I have set variable HADOOP_MASTER in file ${HADOOP_HOME}/conf/hadoop-env.sh to point to the JobTracker hence any changes made there will be synched to your cluster) 
  • Create a new file ${HADOOP_HOME}/conf/secondarynamenode and add following line
    sn.jeka.com
  • In file ${HADOOP_HOME}/bin/start-dfs.sh, replace line
    "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
    with
    ssh $(cat $HADOOP_CONF_DIR/secondarynamenode) "${bin}/hadoop-daemon.sh --config $HADOOP_CONF_DIR --hosts secondarynamenode start secondarynamenode;exit"
  • In file ${HADOOP_HOME}/bin/stop-dfs.sh, replace line
    "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode
    with
    ssh $(cat $HADOOP_CONF_DIR/secondarynamenode) "${bin}/hadoop-daemon.sh --config $HADOOP_CONF_DIR --hosts secondarynamenode stop secondarynamenode;exit"

2.  Logon to Namenode and execute the following commands
  • ${HADOOP_HOME}/bin/stop-dfs.sh; ${HADOOP_HOME}/bin/start-dfs.sh; ${HADOOP_HOME}/bin/stop-dfs.sh; ${HADOOP_HOME}/bin/start-dfs.sh
You have to start and stop twice because in the first start, the code will be synched from JobTracker 
Thats! it. You secondary name node process will now start on the designated server, i.e. sn.jeka.com and not on the datanodes. 

Scenario 2 : If you want to run your SecondaryNameNode on the NameNode (say nn.jeka.com) itself then do the following 
Follow same steps as Scenario 1 except that replace all intances of sn.jeka.com to nn.jeka.com

Scenario 3 : If you do not want to run secondary name node at all then do the following
Follow same steps as Scenario 1 except that instead of replacing lines, delete them. 


5 comments:

  1. Nice writing.

    One thing that I want to note, is that in my understanding, by default, it starts secondaryNameNodes on all master nodes, not data nodes. At least that's what i observed.

    ReplyDelete
  2. Worthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop course. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sure. Thanks anyway.https://www.youtube.com/watch?v=1jMR4cHBwZE

    ReplyDelete

  3. Great presentation of Hadoop form of blog and Hadoop tutorial. Very helpful for beginners like us to understand Hadoop course. if you're interested to have an insight on Hadoop training do watch this amazing tutorial.https://www.youtube.com/watch?v=1jMR4cHBwZE

    ReplyDelete
  4. Interested to know the top 10 technologies of 2019? Watch this:https://www.youtube.com/watch?v=-y5Z2fmnp-o

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete