Hard Disk Recover 2

From Freehackers
Jump to: navigation, search

Following the previous article, here is yet another hard disk recovery.

The problem here, was not the bad sectors (there were few, luckily), but the tools. Contrary to what people usually thinks, some unix tools are very bad. Let me show you.

So, for some reason (electrical shock) i have a hard disk which starts to have bad blocks. The server is almost freezing each time linux tries to access this hard disk. I remove it, and install it on a dedicated computer with its own system disk, a brand new hard disk for the rescued data, and the bad disk. I put the bad disk on its very own ide cable, alone.

Estimation of the beginning of the partition

First thing i do is

 #fdisk  /dev/hdc -l
 
 Disk /dev/hdc: 163.9 GB, 163928604672 bytes
 255 heads, 63 sectors/track, 19929 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 
    Device Boot      Start         End      Blocks   Id  System
 /dev/hdc1   *           1        1217     9775521   fd  Linux raid autodetect
 /dev/hdc2            1218        3650    19543072+  fd  Linux raid autodetect
 /dev/hdc3            3651        3687      297202+  82  Linux swap / Solaris
 /dev/hdc4            3688       19929   130463865    5  Extended
 /dev/hdc5            3688        9767    48837568+  fd  Linux raid autodetect
 /dev/hdc6            9768       19929    81626233+  83  Linux

I dont care about the swap/raid partitions. The only one i'm interested in is hdc6. Funnily, the kernel can't find it:

 # ls /dev/hdc*
 /dev/hdc  /dev/hdc1  /dev/hdc2  /dev/hdc3  /dev/hdc4  /dev/hdc5

You can ask the kernel to re-read the partition table. Several tools (partprobe, hdparm) allow to do that, for example:

 blockdev --rereadpt /dev/hdc

But in this case, this doesn't work.

Ok, so now the problem is to know where in the hard disk the partition starts. Byte-wise, that is. The problem is that tools do not agree on that. The previous fdisk for example, says hdc6 starts on cylinder 9768, the size of a cylinder being 8225280, which would mean 9768*8225280 = 80344535040.

sfdisk -uC /dev/hdc (the -uC displays cylinder number) says:

 /dev/hdc6       9767+  19928   10162-  81626233+  83  Linux

They dont agree on the number of cylinder! (but they do agree on the size of cylinder..)

So this one gives the start of hdc6 on : 9767 * 8225280 = 80336309760.

sfdisk can also output in sectors

 /dev/hdc6     156906918 320159384  163252467  83  Linux

or blocks :

 /dev/hdc6     78453459  160079692-  81626233+  83  Linux

The size of a sector is 512 and the size of a block is 1024, which gives the same start for hdc6 : 78453459 * 1024 = 80336342016 = 156906918 * 512.

Well, anyway, we allo for a big 10Gb margin of security, and we use 80336342016-10*(1024)^3 = 69598923776 = 70000000000 (isn't it ? ;-)

dd_rescueing

Based on Hard disk recovery tools, i decided (this time) to use dd_rescue.

 dd_rescue -v  -s 70000000000 /dev/hdc  /mnt/store/rescue -l ~/rescue.errors -o ~/rescue.badblock

After few hours (i went to sleep meanwhile), it finishes. The result is not that bad:

 dd_rescue: (info): /dev/hdc (160086528.0k): EOF
 Summary for /dev/hdc -> /mnt/store/rescue:
 dd_rescue: (info): ipos: 160086528.0k, opos: 160086528.0k, xferd:  81633069.0k
                    errs:    650, errxfer:       325.0k, succxfer:  81632744.0k
              +curr.rate:     6646kB/s, avg.rate:     2781kB/s, avg.load: -6.9%

Only 325k of bad block, which is roughly 0.0004 % :-).

First try

The first idea is to find exactly where the partition is, copy this partition, and use reiserfsck on it. PROs:

  • straightforward

CONs:

  • you have to find the very exact beginning of the partition.
  • you need roughtly twice as much of free disk space (for the copy)
  • you cannot use the badblock files from dd_rescue, or then you need to translate it, and that's not that easy

Find the partition

As i didn't specify anything for the output file, dd_rescue did a copy of /dev/hdc, of the same size:

 #ls -l 
 -rw-r----- 1 root root 163928604672 Aug  1 06:12 rescue

The whole beginning was not copied though. The actual size on the hard disk is only half of this :

 du -shx rescue
 79G     rescue

Now i need to use reiserfsck. For this, i would like to 1) check the exact start address for the hdc6 partion, 2) copy the relevant data to another file.

I first tried to use hexdump.

 hexdump  -s 80336342016  rescue
 7fffffff 0000 0000 0000 0000 0000 0000 0000 0000
 *

AHA, you see ? hexdump can't handle index more than 7fffffff. What a shame. After a while a figured out there's another tool, called 'od'. od is great : it can handle such long index. But in mostly knows about octal. How odd. For example i never managed to make it output the index number in anything else than octal.

The partition on hdc6 was using the reiserfs filesystem. The very beginning of a reiserfs filesystem is made of 64k of "nothing" and then a header. Here is an example on a real, working, partition.

 od  -x /dev/sdb1 |less
 0000000 0000 0000 0000 0000 0000 0000 0000 0000
 *
 0200000 1470 02e9 ee8a 02e8 2013 0000 0012 0000
 0200020 0000 0000 2000 0000 0400 0000 39a4 13fb
 0200040 0384 0000 001e 0000 0000 0000 1000 03cc
 0200060 0002 0002 6552 7349 7245 4632 0073 0000

(the -x is for hex output, but only display the content in hex, not the index) The "6552 7349 7245 4632" stands for "ReIsEr2F", the start of the magic signature of the header, in ascii little endian. You can google for "reiserfs magic" to find more information about that. (here is an example)

Doing the same on my rescue file, i find the beginning of the partition. I search (using less) for the string "6552 7349 7245 4632" and finds this :

 od -x -j 80336342016  rescue |less
 ...
 1126433176000 6110 0137 e0df 00ba 1b56 00a5 0012 0000
 1126433176020 0000 0000 2000 0000 0400 0000 e490 2559
 1126433176040 0384 0000 001e 0000 0000 0000 1000 03cc
 1126433176060 0002 0002 6552 7349 7245 4632 0073 0000

This gives the start of the header on (dec)80336452608 = (octal)1126433176000, and hence, the start of the partition on : (octal) 01126433176000 - 0200000 = 01126432776000 In decimal : 80336387072. This is not what we had previously estimated : we did an error of80336387072 - 80336342016 = 45056 = 0xB000. Strange....

So now, i extract the data for the partition on another file :

 dd_rescue  -v -s 80336387072 -S 0 rescue  rescue.partition6

The first thing i do when it started, from another console, is to check with file

 # file *
 rescue:            data
 rescue.partition6: ReiserFS V3.6 block size 4096 (mounted or unclean) num blocks 20406544 r5 hash

So it seems ok! :-)

(the copy takes some time)

Reiserfsck'ing

Now i can use reiserfsck on this last file and try to restore it.

First, juste note that as we did some weird change with the partition, we should recreate the super block

 reiserfsck --rebuild-sb rescue.partition6

And indeed, reiserfsck asks

 rebuild-sb: You either have a corrupted journal or have just changed
 the start of the partition with some partition table editor. If you are
 sure that the start of the partition is ok, rebuild the journal header.
 Do you want to rebuild the journal header? (y/n)[n]:     

important : I first tried to recover without doing this first step (--rebuild-sb), and things seemed to go well, but most of the files were lost.

Then, we rebuild the whole tree, asking to scan for the whole partition, in case some leaves are just 'disconnected' from the dir graph. We can't use the --badblocks option here, as they are indexes from the start of the disk and not from the partition.

 reiserfsck --rebuild-tree --scan-whole-partition  rescue.partition6

This one is long, and quite verbose.

At the ends it printed

 Flushing..finished
         Objects without names 4442
         Empty lost dirs removed 182
         Dirs linked to /lost+found: 182
                 Dirs without stat data found 30
         Files linked to /lost+found 4260
         Objects having used objectids: 579
                 files fixed 553
                 dirs fixed 26
 Pass 4 - finished  done 5303, 662 /sec
         Deleted unreachable items 4167
 Flushing..finished
 Syncing..finished

I do another check to be sure the file is mountable

 reiserfsck  --check rescue.partition6

And then i mounted it using the linux loopback driver:

 # mount -o loop -t reiserfs rescue.partition6  /mnt/rescued/

conclusion

90.05% of the files were in lost+found, with meaningless names (such as 2349_2430). Among the files with real names, lot of them were not accurate. Like a file called 'g3.png', wich is of size 730083328


Second try

The idea here is to work directly on the hdc copy. Please remember that only the last part of it was copied. So we first need to re-create the partition table on it. Moreover, as we dont want to experiment on the only copy we have, we should copy it first before playing. We shall here use another way : we will use cowloop to mount this precious file and keep the modification on a separate file PROs:

  • you can use the badblock file from dd_rescue
  • you dont have to find the start of the partition

CONs:

  • slightly harder to do


Copy the data

I put a disk as /dev/sda, which is at least the size of the bad disk (160g here, sda is 300g)

Then i copy the bad disk the same way as previously, starting at 70000000000 as stated in the beginning :

 dd_rescue -v -s 70000000000 /dev/hdc /dev/sda -l ~/rescue.errors -o ~/rescue.badblock

which ends with :

 dd_rescue: (info): /dev/hdc (160086528.0k): EOF
 Summary for /dev/hdc -> /dev/sda:
 dd_rescue: (info): ipos: 160086528.0k, opos: 160086528.0k, xferd:  91727153.0k
                    errs:    640, errxfer:       320.0k, succxfer:  91726833.0k
              +curr.rate:    20597kB/s, avg.rate:     2244kB/s, avg.load: -3.8%

I also copy the very beginning doing that for few seconds, to have the partition set up:

 dd_rescue -v   /dev/hdc  /dev/sda

Unfortunately, that doesn't work :

 # fdisk  -l /dev/sda
 Warning: ignoring extra data in partition table 5
 Warning: ignoring extra data in partition table 5
 Warning: ignoring extra data in partition table 5
 Warning: invalid flag 0x5955 of partition table 5 will be corrected by w(rite)
 
 Disk /dev/sda: 320.0 GB, 320072933376 bytes
 255 heads, 63 sectors/track, 38913 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 
    Device Boot      Start         End      Blocks   Id  System
 /dev/sda1   *           1        1217     9775521   fd  Linux raid autodetect
 /dev/sda2            1218        3650    19543072+  fd  Linux raid autodetect
 /dev/sda3            3651        3687      297202+  82  Linux swap / Solaris
 /dev/sda4            3688       19929   130463865    5  Extended
 /dev/sda5   ?      113514      228064   920121511+  54  OnTrackDM6

so i use fdisk to do it exactly the same as /dev/hdc

 ...
  # fdisk  -l /dev/sda
 
 Disk /dev/sda: 320.0 GB, 320072933376 bytes
 255 heads, 63 sectors/track, 38913 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes
 
    Device Boot      Start         End      Blocks   Id  System
 /dev/sda1   *           1        1217     9775521   fd  Linux raid autodetect
 /dev/sda2            1218        3650    19543072+  fd  Linux raid autodetect
 /dev/sda3            3651        3687      297202+  82  Linux swap / Solaris
 /dev/sda4            3688       19929   130463865    5  Extended
 /dev/sda5            3688        9767    48837568+  83  Linux
 /dev/sda6            9768       19929    81626233+  83  Linux

Reiserfsck'ing

Restoring the superblock is a first, difficult task

 reiserfsck --rebuild-sb /dev/sda6

says :

 reiserfs_open: the reiserfs superblock cannot be found on /dev/sda6.
 
 what the version of ReiserFS do you use[1-4]
         (1)   3.6.x
         (2) >=3.5.9 (introduced in the middle of 1999) (if you use linux 2.2, choose this one)
         (3) < 3.5.9 converted to new format (don't choose if unsure)
         (4) < 3.5.9 (this is very old format, don't choose if unsure)
         (X)   exit

I know i'm using 3.6.x, and I answer (1) and other questions (block size=4096)

The final answer is

 rebuild-sb: no uuid found, a new uuid was generated (3501e938-e251-4a17-bc70-9ca201d813ed)
 
 rebuild-sb: You either have a corrupted journal or have just changed
 the start of the partition with some partition table editor. If you are
 sure that the start of the partition is ok, rebuild the journal header.
 Do you want to rebuild the journal header? (y/n)[n]: y
 Reiserfs super block in block 16 on 0x806 of format 3.6 with standard journal
 Count of blocks on the device: 20406544
 Number of bitmaps: 623
 Blocksize: 4096
 Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0
 Root block: 0
 Filesystem is NOT clean
 Tree height: 0
 Hash function used to sort names: not set
 Objectid map size 0, max 972
 Journal parameters:
         Device [0x0]
         Magic [0x0]
         Size 8193 blocks (including 1 for journal header) (first block 18)
         Max transaction length 1024 blocks
         Max batch size 900 blocks
         Max commit age 30
 Blocks reserved by journal: 0
 Fs state field: 0x1:
          some corruptions exist.
 sb_version: 2
 inode generation number: 0
 UUID: 3501e938-e251-4a17-bc70-9ca201d813ed
 LABEL:
 Set flags in SB:


Then, the usual way, we rebuild the whole partition

 reiserfsck --rebuild-tree --scan-whole-partition  /dev/sda6

and mount it.

Third try

I put a disk as /dev/sda, which is at least the size of the bad disk (160g here, sda is 300g)


copy the partition table

I do it by hand. First display the one from the bad disk

 fdisk -l /dev/hdc

And I edit the one for the new hard disk

 fdisk /dev/sda

At the end I check that I have, at least for the partition beeing rescued, the same output

 fdisk -l /dev/sda

Copy the data

Now I copy the bad disk the same way as previously. I use the '-r' flag, because the beginning of the partition is definitely dead, so i prefer copying from the end downward. I copy only after 70000000000 as stated in the beginning :

 dd_rescue -r -A -v /dev/hdc6 /dev/sda6 -l ~/rescue.errors -o ~/rescue.badblock


Reiserfsck'ing

The bad news is that sector '128', which contains the superblock, is dead :-(. I remove the range 128-500 from the badblock file

Restoring the superblock is ok (you need to answer several questions)

 reiserfsck --rebuild-sb /dev/sda6

Then, the usual way, we rebuild the whole partition, using the badblock file

 reiserfsck --rebuild-tree --scan-whole-partition -B ~/rescue.badblock /dev/sda6

and mount it.