Wednesday, December 1, 2010

Error in starting Datanode - ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in

While playing around with Hadoop setup I found that after I did something, the datanodes will not start when I run script bin/start-dfs.sh.

Following were the steps I followed to troubleshoot and fix the issue

TO FIND THE PROBLEM
  1. Login to any one of the datanodes that is not starting up (say t1.mydomain.com)
  2. Get the value of variable HADOOP_LOG_DIR in file conf/hadoop-env.sh. Say the value is /home/jeka/runtime_hadoop_data/logs
  3. Look into datanode log file /home/jeka/runtime_hadoop_data/logs/hadoop-jeka-datanode-t1.mydomain.com.log. Following error was logged
    2010-12-01 18:57:39,115 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/jeka/runtime_hdfs/datanode: namenode namespaceID = 1509057607; datanode namespaceID = 1781994419
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.
    (DataNode.java:216)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)

REASON
  1. This issue happened right after I reformatted the namenode (and did not reformat datanodes). Everytime Namenode is formatted, Hadoop creates a unique namespaceID and places it in a file in the Namenode but since I did not formatted the datanodes hence datanodes still had the old namespaceID and hence the problem. 


SOLUTION 
The solution is to copy the new namespaceID from Namenode to Datanodes. Following are the steps
  1. Logon to the namenode (say nn.mydomain.com) 
  2. Stop DFS by running command bin/stop-dfs.sh
  3. Find the values of property dfs.name.dir and dfs.data.dir. This can be found in file conf/hdfs-site.xml. If missing then look for it in file src/hdfs/hdfs-default.xml. Say the value of these properties are /home/jeka/runtime_hdfs/namenode and /home/jeka/runtime_hdfs/datanode respectively
  4. Note the value of field namespaceID (this is the new Namenode namespaceID that we need to copy to all datanodesin file /home/jeka/runtime_hdfs/namenode/current/VERSION.

    In our case its 1509057607.

    For your reference, following are all the contents of this file
    #Wed Dec 01 19:05:31 UTC 2010
    namespaceID=1509057607
    cTime=0
    storageType=NAME_NODE
    layoutVersion=-18
  5. Now copy the new namespaceID,1509057607 in our case, to file /home/jeka/runtime_hdfs/datanode/current/VERSION on all datanodes by running the following command on shell prompt on Namenode

    for dn in $(cat ~/hadoop-0.20.2/conf/slaves);do ssh $dn "cat /home/jeka/runtime_hdfs/datanode/current/VERSION | sed 's/namespaceID=[0-9]*/namespaceID=1509057607/' > /home/jeka/runtime_hdfs/datanode/current/VERSION.temp; mv /home/jeka/runtime_hdfs/datanode/current/VERSION.temp /home/jeka/runtime_hdfs/datanode/current/VERSION";done

    This command will go to each and every datanode listed in file conf/slaves and change the namespaceID of the datanode to 1509057607
  6. Start DFS by running command bin/start-dfs.sh. Thats it!, all datanodes should be up and running now.


No comments:

Post a Comment