Hadoop NameNode doesn’t want to share it’s blocks

Posted by scottk on Mar 22, 2011 in Hadoop |

Been running into a lot of the following errors in our Hadoop install

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:/that_one_file/part-00000
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1268)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2939)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2094)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2281)

I believe the issue is that our dfs.namenode.handler.count is set to 25 and with a cluster of 20 servers the NameNode is getting flooded with requests when the reducers finish and write out to hdfs along with the other HDFS traffic we have going on at any given point in time. If someone else knows better please let me know.

2 Comments

Bunty
Aug 24, 2011 at 6:20 pm

Did u find a workaround for this exception


 
scottk
Aug 25, 2011 at 7:17 am

I added a second nic port to the servers and bonded the interfaces and the errors seemed to go away. I’m not sure it that was the answer or the developers pushed a code change.


 

Reply

Copyright © 2013 SimpIT.com All rights reserved. Theme by Laptop Geek.