Tuesday, December 7, 2010

Hadoop distcp error: java.lang.NumberFormatException: For input string: ""

Hadoop provide distcp command to copy data between clusters.


ERROR
When running this command I got the following error
java.lang.NumberFormatException: For input string: ""
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Integer.parseInt(Integer.java:470)
        at java.lang.Integer.parseInt(Integer.java:499)
        at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:149)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:164)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:81)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1448)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1476)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:197)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at org.apache.hadoop.tools.DistCp.setup(DistCp.java:997)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:884)

REASON
This error normally happens when the port number in source or destination hdfs URI is missing.


Example: Execution of the following command will give this error
hadoop distcp hdfs://hadoop1.jeka.com:9000/user/hadoop/jeka hdfs://hadoop2.jeka.com:/user/hadoop/jeka

It happened because port in the destination URL /hdfs://hadoop2.jeka.com:/user/hadoop/jeka is missing but the colon that separates 

RESOLUTION
Add port number 9000 after the colon, i.e. the destination URI will look like hdfs://hadoop2.jeka.com:9000/user/hadoop/jeka 

or remove the colon, distcp by default looks for port 9000, hence the destination URI will look like 
hdfs://hadoop2.jeka.com/user/hadoop/jeka 

No comments:

Post a Comment