Hard Disk Recover 2
From Freehackers
Following the previous article, here is yet another hard disk recovery.
The problem here, was not the bad sectors (there were few, luckily), but the tools. Contrary to what people usually thinks, some unix tools are very bad. Let me show you.
So, for some reason (electrical shock) i have a hard disk which starts to have bad blocks. The server is almost freezing each time linux tries to access this hard disk. I remove it, and install it on a dedicated computer with its own system disk, a brand new hard disk for the rescued data, and the bad disk. I put the bad disk on its very own ide cable, alone.
Contents |
[edit] Estimation of the beginning of the partition
First thing i do is
#fdisk /dev/hdc -l
Disk /dev/hdc: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hdc1 * 1 1217 9775521 fd Linux raid autodetect
/dev/hdc2 1218 3650 19543072+ fd Linux raid autodetect
/dev/hdc3 3651 3687 297202+ 82 Linux swap / Solaris
/dev/hdc4 3688 19929 130463865 5 Extended
/dev/hdc5 3688 9767 48837568+ fd Linux raid autodetect
/dev/hdc6 9768 19929 81626233+ 83 Linux
I dont care about the swap/raid partitions. The only one i'm interested in is hdc6. Funnily, the kernel can't find it:
# ls /dev/hdc* /dev/hdc /dev/hdc1 /dev/hdc2 /dev/hdc3 /dev/hdc4 /dev/hdc5
You can ask the kernel to re-read the partition table. Several tools (partprobe, hdparm) allow to do that, for example:
blockdev --rereadpt /dev/hdc
But in this case, this doesn't work.
Ok, so now the problem is to know where in the hard disk the partition starts. Byte-wise, that is. The problem is that tools do not agree on that. The previous fdisk for example, says hdc6 starts on cylinder 9768, the size of a cylinder being 8225280, which would mean 9768*8225280 = 80344535040.
sfdisk -uC /dev/hdc (the -uC displays cylinder number) says:
/dev/hdc6 9767+ 19928 10162- 81626233+ 83 Linux
They dont agree on the number of cylinder! (but they do agree on the size of cylinder..)
So this one gives the start of hdc6 on : 9767 * 8225280 = 80336309760.
sfdisk can also output in sectors
/dev/hdc6 156906918 320159384 163252467 83 Linux
or blocks :
/dev/hdc6 78453459 160079692- 81626233+ 83 Linux
The size of a sector is 512 and the size of a block is 1024, which gives the same start for hdc6 : 78453459 * 1024 = 80336342016 = 156906918 * 512.
Well, anyway, we allo for a big 10Gb margin of security, and we use 80336342016-10*(1024)^3 = 69598923776 = 70000000000 (isn't it ? ;-)
[edit] dd_rescueing
Based on Hard disk recovery tools, i decided (this time) to use dd_rescue.
dd_rescue -v -s 70000000000 /dev/hdc /mnt/store/rescue -l ~/rescue.errors -o ~/rescue.badblock
After few hours (i went to sleep meanwhile), it finishes. The result is not that bad:
dd_rescue: (info): /dev/hdc (160086528.0k): EOF
Summary for /dev/hdc -> /mnt/store/rescue:
dd_rescue: (info): ipos: 160086528.0k, opos: 160086528.0k, xferd: 81633069.0k
errs: 650, errxfer: 325.0k, succxfer: 81632744.0k
+curr.rate: 6646kB/s, avg.rate: 2781kB/s, avg.load: -6.9%
Only 325k of bad block, which is roughly 0.0004 % :-).
[edit] First try
The first idea is to find exactly where the partition is, copy this partition, and use reiserfsck on it. PROs:
- straightforward
CONs:
- you have to find the very exact beginning of the partition.
- you need roughtly twice as much of free disk space (for the copy)
- you cannot use the badblock files from dd_rescue, or then you need to translate it, and that's not that easy
[edit] Find the partition
As i didn't specify anything for the output file, dd_rescue did a copy of /dev/hdc, of the same size:
#ls -l -rw-r----- 1 root root 163928604672 Aug 1 06:12 rescue
The whole beginning was not copied though. The actual size on the hard disk is only half of this :
du -shx rescue 79G rescue
Now i need to use reiserfsck. For this, i would like to 1) check the exact start address for the hdc6 partion, 2) copy the relevant data to another file.
I first tried to use hexdump.
hexdump -s 80336342016 rescue 7fffffff 0000 0000 0000 0000 0000 0000 0000 0000 *
AHA, you see ? hexdump can't handle index more than 7fffffff. What a shame. After a while a figured out there's another tool, called 'od'. od is great : it can handle such long index. But in mostly knows about octal. How odd. For example i never managed to make it output the index number in anything else than octal.
The partition on hdc6 was using the reiserfs filesystem. The very beginning of a reiserfs filesystem is made of 64k of "nothing" and then a header. Here is an example on a real, working, partition.
od -x /dev/sdb1 |less 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0200000 1470 02e9 ee8a 02e8 2013 0000 0012 0000 0200020 0000 0000 2000 0000 0400 0000 39a4 13fb 0200040 0384 0000 001e 0000 0000 0000 1000 03cc 0200060 0002 0002 6552 7349 7245 4632 0073 0000
(the -x is for hex output, but only display the content in hex, not the index) The "6552 7349 7245 4632" stands for "ReIsEr2F", the start of the magic signature of the header, in ascii little endian. You can google for "reiserfs magic" to find more information about that. (here is an example)
Doing the same on my rescue file, i find the beginning of the partition. I search (using less) for the string "6552 7349 7245 4632" and finds this :
od -x -j 80336342016 rescue |less ... 1126433176000 6110 0137 e0df 00ba 1b56 00a5 0012 0000 1126433176020 0000 0000 2000 0000 0400 0000 e490 2559 1126433176040 0384 0000 001e 0000 0000 0000 1000 03cc 1126433176060 0002 0002 6552 7349 7245 4632 0073 0000
This gives the start of the header on (dec)80336452608 = (octal)1126433176000, and hence, the start of the partition on : (octal) 01126433176000 - 0200000 = 01126432776000 In decimal : 80336387072. This is not what we had previously estimated : we did an error of80336387072 - 80336342016 = 45056 = 0xB000. Strange....
So now, i extract the data for the partition on another file :
dd_rescue -v -s 80336387072 -S 0 rescue rescue.partition6
The first thing i do when it started, from another console, is to check with file
# file * rescue: data rescue.partition6: ReiserFS V3.6 block size 4096 (mounted or unclean) num blocks 20406544 r5 hash
So it seems ok! :-)
(the copy takes some time)
[edit] Reiserfsck'ing
Now i can use reiserfsck on this last file and try to restore it.
First, juste note that as we did some weird change with the partition, we should recreate the super block
reiserfsck --rebuild-sb rescue.partition6
And indeed, reiserfsck asks
rebuild-sb: You either have a corrupted journal or have just changed the start of the partition with some partition table editor. If you are sure that the start of the partition is ok, rebuild the journal header. Do you want to rebuild the journal header? (y/n)[n]:
important : I first tried to recover without doing this first step (--rebuild-sb), and things seemed to go well, but most of the files were lost.
Then, we rebuild the whole tree, asking to scan for the whole partition, in case some leaves are just 'disconnected' from the dir graph. We can't use the --badblocks option here, as they are indexes from the start of the disk and not from the partition.
reiserfsck --rebuild-tree --scan-whole-partition rescue.partition6
This one is long, and quite verbose.
At the ends it printed
Flushing..finished
Objects without names 4442
Empty lost dirs removed 182
Dirs linked to /lost+found: 182
Dirs without stat data found 30
Files linked to /lost+found 4260
Objects having used objectids: 579
files fixed 553
dirs fixed 26
Pass 4 - finished done 5303, 662 /sec
Deleted unreachable items 4167
Flushing..finished
Syncing..finished
I do another check to be sure the file is mountable
reiserfsck --check rescue.partition6
And then i mounted it using the linux loopback driver:
# mount -o loop -t reiserfs rescue.partition6 /mnt/rescued/
[edit] conclusion
90.05% of the files were in lost+found, with meaningless names (such as 2349_2430). Among the files with real names, lot of them were not accurate. Like a file called 'g3.png', wich is of size 730083328
[edit] Second try
The idea here is to work directly on the hdc copy. Please remember that only the last part of it was copied. So we first need to re-create the partition table on it. Moreover, as we dont want to experiment on the only copy we have, we should copy it first before playing. We shall here use another way : we will use cowloop to mount this precious file and keep the modification on a separate file PROs:
- you can use the badblock file from dd_rescue
- you dont have to found the start of the partition
CONs:
- slightly harder to do
[edit] Copy the data
I put a disk as /dev/sda, which is at least the size of the bad disk (160g here, sda is 300g)
Then i copy the bad disk the same way as previously, starting at 70000000000 as stated in the beginning :
dd_rescue -v -s 70000000000 /dev/hdc /dev/sda -l ~/rescue.errors -o ~/rescue.badblock
which ends with :
dd_rescue: (info): /dev/hdc (160086528.0k): EOF
Summary for /dev/hdc -> /dev/sda:
dd_rescue: (info): ipos: 160086528.0k, opos: 160086528.0k, xferd: 91727153.0k
errs: 640, errxfer: 320.0k, succxfer: 91726833.0k
+curr.rate: 20597kB/s, avg.rate: 2244kB/s, avg.load: -3.8%
I also copy the very beginning doing that for few seconds, to have the partition set up:
dd_rescue -v /dev/hdc /dev/sda
Unfortunately, that doesn't work :
# fdisk -l /dev/sda
Warning: ignoring extra data in partition table 5
Warning: ignoring extra data in partition table 5
Warning: ignoring extra data in partition table 5
Warning: invalid flag 0x5955 of partition table 5 will be corrected by w(rite)
Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 1217 9775521 fd Linux raid autodetect
/dev/sda2 1218 3650 19543072+ fd Linux raid autodetect
/dev/sda3 3651 3687 297202+ 82 Linux swap / Solaris
/dev/sda4 3688 19929 130463865 5 Extended
/dev/sda5 ? 113514 228064 920121511+ 54 OnTrackDM6
so i use fdisk to do it exactly the same as /dev/hdc
...
# fdisk -l /dev/sda
Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 1217 9775521 fd Linux raid autodetect
/dev/sda2 1218 3650 19543072+ fd Linux raid autodetect
/dev/sda3 3651 3687 297202+ 82 Linux swap / Solaris
/dev/sda4 3688 19929 130463865 5 Extended
/dev/sda5 3688 9767 48837568+ 83 Linux
/dev/sda6 9768 19929 81626233+ 83 Linux
[edit] Reiserfsck'ing
Restoring the superblock is a first, difficult task
reiserfsck --rebuild-sb /dev/sda6
says :
reiserfs_open: the reiserfs superblock cannot be found on /dev/sda6.
what the version of ReiserFS do you use[1-4]
(1) 3.6.x
(2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one)
(3) < 3.5.9 converted to new format (don't choose if unsure)
(4) < 3.5.9 (this is very old format, don't choose if unsure)
(X) exit
I know i'm using 3.6.x, and I answer (1) and other questions (block size=4096)
The final answer is
rebuild-sb: no uuid found, a new uuid was generated (3501e938-e251-4a17-bc70-9ca201d813ed)
rebuild-sb: You either have a corrupted journal or have just changed
the start of the partition with some partition table editor. If you are
sure that the start of the partition is ok, rebuild the journal header.
Do you want to rebuild the journal header? (y/n)[n]: y
Reiserfs super block in block 16 on 0x806 of format 3.6 with standard journal
Count of blocks on the device: 20406544
Number of bitmaps: 623
Blocksize: 4096
Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0
Root block: 0
Filesystem is NOT clean
Tree height: 0
Hash function used to sort names: not set
Objectid map size 0, max 972
Journal parameters:
Device [0x0]
Magic [0x0]
Size 8193 blocks (including 1 for journal header) (first block 18)
Max transaction length 1024 blocks
Max batch size 900 blocks
Max commit age 30
Blocks reserved by journal: 0
Fs state field: 0x1:
some corruptions exist.
sb_version: 2
inode generation number: 0
UUID: 3501e938-e251-4a17-bc70-9ca201d813ed
LABEL:
Set flags in SB:
Then, the usual way, we rebuild the whole partition
reiserfsck --rebuild-tree --scan-whole-partition /dev/sda6
and mount it.
[edit] Third try
I put a disk as /dev/sda, which is at least the size of the bad disk (160g here, sda is 300g)
[edit] copy the partition table
I do it by hand. First display the one from the bad disk
fdisk -l /dev/hdc
And I edit the one for the new hard disk
fdisk /dev/sda
At the end I check that I have, at least for the partition beeing rescued, the same output
fdisk -l /dev/sda
[edit] Copy the data
Now I copy the bad disk the same way as previously. I use the '-r' flag, because the beginning of the partition is definitely dead, so i prefer copying from the end downward. I copy only after 70000000000 as stated in the beginning :
dd_rescue -r -A -v /dev/hdc6 /dev/sda6 -l ~/rescue.errors -o ~/rescue.badblock
[edit] Reiserfsck'ing
The bad news is that sector '128', which contains the superblock, is dead :-(. I remove the range 128-500 from the badblock file
Restoring the superblock is ok (you need to answer several questions)
reiserfsck --rebuild-sb /dev/sda6
Then, the usual way, we rebuild the whole partition, using the badblock file
reiserfsck --rebuild-tree --scan-whole-partition -B ~/rescue.badblock /dev/sda6
and mount it.