Wednesday, December 8, 2010

How configure Secondary namenode on a separate machine

If you have installed cloudera's hadoop distribution (CDH2) then you must have noticed that running command starts an instance of SecondaryNameNode process on all the datanodes. This is happening due to the way SecondaryNameNode startup is defined in file bin/ 

Scenario 1 : If you want to run your SecondaryNameNode on some other server (say instead of the datanodes then do the following 

1. Logon to JobTracker (I am going to JobTracker because I have set variable HADOOP_MASTER in file ${HADOOP_HOME}/conf/ to point to the JobTracker hence any changes made there will be synched to your cluster) 
  • Create a new file ${HADOOP_HOME}/conf/secondarynamenode and add following line
  • In file ${HADOOP_HOME}/bin/, replace line
    "$bin"/ --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
    ssh $(cat $HADOOP_CONF_DIR/secondarynamenode) "${bin}/ --config $HADOOP_CONF_DIR --hosts secondarynamenode start secondarynamenode;exit"
  • In file ${HADOOP_HOME}/bin/, replace line
    "$bin"/ --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode
    ssh $(cat $HADOOP_CONF_DIR/secondarynamenode) "${bin}/ --config $HADOOP_CONF_DIR --hosts secondarynamenode stop secondarynamenode;exit"

2.  Logon to Namenode and execute the following commands
  • ${HADOOP_HOME}/bin/; ${HADOOP_HOME}/bin/; ${HADOOP_HOME}/bin/; ${HADOOP_HOME}/bin/
You have to start and stop twice because in the first start, the code will be synched from JobTracker 
Thats! it. You secondary name node process will now start on the designated server, i.e. and not on the datanodes. 

Scenario 2 : If you want to run your SecondaryNameNode on the NameNode (say itself then do the following 
Follow same steps as Scenario 1 except that replace all intances of to

Scenario 3 : If you do not want to run secondary name node at all then do the following
Follow same steps as Scenario 1 except that instead of replacing lines, delete them. 

1 comment:

  1. Nice writing.

    One thing that I want to note, is that in my understanding, by default, it starts secondaryNameNodes on all master nodes, not data nodes. At least that's what i observed.