Pointers to LINUX Tools for Rescuing Data from defect Hard Drives
Tilo Sloboda, Aug 2005

After the unfortunate loss of two hard-drives in my RAID5 array, and a friend of mine loosing his external drive, I needed to do some research on tools for rescuing data from defect hard drives and defect filesystems..

I was using a LINUX software RAID5 at the time, and while trying to repair a one-disk error, the system would suddenly completely hang-up, caused by uncorrectable DMA-errors on one of the other drives..

The best advice in this case is:

To create a disk-image, you can not use dd, because it's designed to abort on errors.. That's why you need tools like dd_rescue / dd_rhelp , or ddrescue (GNU), which don't abort on errors -- these tools try to create a disk-image of the defect drive, salvaging as many blocks as possible from the bad drive..

Once you have created the disk-image, make a copy of it!! Creating the image takes a lot of time, and you don't want to run a fsck on the copy itself, because you may want to try several different tools to salvage your data!

The resulting disk-image file can be mounted via "mount -o loop" and then analyzed..

I also want to mention one trick, I learned a long time ago: you can try to swap the controller-board of a defect hard-drive with one of an identical functioning drive -- this may help you if the controller board went bad, but not if the disk itself has bad blocks..

Temporary Data:

This is the time to purchase a large harddrive for storing the temporary data!! You'll probably need at least 2..3 times as much temporary space as the size of your defect harddrive partition(s).. the more the better..

It might be a good idea to put the temporary disk space on a different server, and cross-mount it -- this way you avoid doing a fsck on your (huge) temp-partition every time the system doing the rescue hangs-up and needs to be rebooted.... ;-)

DMA Errors:

First step, if you see DMA errors, which hang-up your system: you need to do a 'hdparm -d0 -r1 /dev/hdX' on the raw-device of your defect drive before using any of the tools below.. this will disable DMA for that drive, and set the drive to read-only.

Other Errors:

Once dd_rhelp has narrowed down the area where the error on the disk is, it will most likely hang your system every time it tries to access the location of the error.. To reduce the pain of having to do reboots, I did a 'hdparm -d0 -r1 -m0 -P0 -A0 -a0 /dev/hdg' , to switch-off all kinds of read-ahead on the defect disk (read the man page of hdparm!).

PREVENTION:

Hard disks die! They always(!) do that sooner or later... sooner, if they are not properly cooled(!)... and they tend to die all at the same time, if they were purchased around the same time...

To prevent being surprised by the death of a hard-drive, it's highly advisable to monitor the S.M.A.R.T status of disks and to run tests on your disks on a regular basis (e.g. run smartd).. this way you can see problems before they become fatal..

MOST IMPORTANT: Make backups of your valuable data! Don't trust just one hard drive!

Rescue Tools:

Here's a list of the tools, I found -- In no particular order(!) -- I hope it will help somebody out there..

 

 

A big thank you to Kurt Garloff and Antonio Diaz Diaz!