https://www.openscie...ery?cover=print
Had corrupted NameNode in a master node VM this morning at the office...
apparently the 'edits' file could not be read at offset 126.... However, when i looked at the edits file it was only like, 4 bytes, thus having only 4 offsets (0x4).
my SecondaryNameNode had the exact same edits file, thus i could not use it for a recovery... at one point i think it complained that the version was 3 and the namenode was expecting version 4 during init.
I decided to just roll the VM back to a snapshot that it had (using ESXi 4.1)... but later read that i could have created a new NameNode and imported the previous.checkpoint.
Anyone have any experience recovering hadoop from corrupted edits file? Or just recovering any aspect of hadoop? I'm trying to get a better understanding of the SecondaryNameNode's role within the cluster, as well as a better understanding of the purpose behind the checkpoints and how often they are/should be created.
Hadoop Recovery
Started by Champion of Cyrodiil, Nov 05 2012 01:10 PM
hadoop recovery edits file namenode offsets previous.checkpoint
5 replies to this topic
#2
Posted 07 November 2012 - 02:17 PM
After speaking with one of the Apache developers for a while. I think we narrowed down the issues. Some of the developers were using hadoop-core-1.0.2.jar in the client buildpath. The version of Hadoop supported in my environment (because its what the customer env supports) is Hadoop 0.20.2. Using the newer client would be okay for reading data and in theory writing simple mutations. But for Bulk Imports and Map Reduce jobs, you would want to use the hadoop-core-0.20.2.jar in your client.
The developers are actually using Accumulo/Cloudbase which stacks on Zookeeper which stacks on Hadoop stack... but the client jar files for cloudbase depend on hadoop-core and looking at the package names, most definitely make use of the bulkimport and map reduce capability.
Thus, use the right client library version for the server your running. Even if the newer libraries claim to be backwards compatible.
The developers are actually using Accumulo/Cloudbase which stacks on Zookeeper which stacks on Hadoop stack... but the client jar files for cloudbase depend on hadoop-core and looking at the package names, most definitely make use of the bulkimport and map reduce capability.
Thus, use the right client library version for the server your running. Even if the newer libraries claim to be backwards compatible.
#3
Posted 07 November 2012 - 02:19 PM
Another error that popped up from the same problem was:
WARN: Incorrect header of version mismatch from <ClientHost>:<ephemeral port> got version 3 expected version 4
WARN: Incorrect header of version mismatch from <ClientHost>:<ephemeral port> got version 3 expected version 4
#4
Posted 07 November 2012 - 02:20 PM
Also the cluster was pseudo-distributed on a single VM, thus had only one data node... so there were not even enough slaves to make the cluster properly fault tolerant. Should have a minimum of 3 data nodes in a cluster.
#5
Posted 07 November 2012 - 11:24 PM
From my experience, mysterious errors that come without you changing anything are almost always PEBKAC on the part of another user.
Rumors of my demise have been greatly exaggerated.
#6
Posted 08 November 2012 - 03:32 PM
yea... Im supporting a team of about 15 java/ozone developers that havent used Hadoop... also Im new to it, so its understandable... this time.
Also tagged with one or more of these keywords: hadoop, recovery, edits file, namenode, offsets, previous.checkpoint
Computers →
Coding →
Watching an HDFS folderStarted by Champion of Cyrodiil, 17 Apr 2013 hadoop, hdfs, java and 2 more... |
|
||
Other Nerdy Things →
Everything Else →
*recovers deleted file*Started by SpleenBeGone, 29 Mar 2012 recover, deleted, file, recovery and 2 more... |
|