NAS and Server rescue

Started by Tom, February 08, 2015, 02:36:13 PM

Previous topic - Next topic

Tom

So... I had some fun working on two computers this week.

Basically my NAS dropped 2 drives out of the 7 in its raid 5 array. Usually this means your data is gone. just gone. So as you can imagine, I was a bit upset. Not only that, but my backup of the NAS array had been "borrowed" while I was playing with that new big server I built last summer, so I had no backup of any of the data on the array. In addition, that backup array had also kicked out two members, and was DOA. I suspect these sas cards dislike it when a malfunctioning drive is connected and will cause delays or spurious errors when accessing other drives, which will make mdraid boot them.

I've spent quite a bit of time this week to try and rescue the data on both arrays.

I learned about:

  • ways to rescue a failed md raid array without risking your data or having to buy an entire set of replacement drives
  • a super cool tool called gnu parallel
  • udevadm info gives very useful information about drives (like firmware version, serial number, etc)
  • just how @%&#ty recent Seagate drives are (40% failure rate for the 3TB disks from a year or two ago

Tools used:

  • linux
  • dd
  • gddrescue
  • testdisk
  • mdadm --detail, --query, --assemble --force
  • dmesg
  • udevadm info - fetch important details from drives, like the drive serial and model numbers
  • gnu parallel - run multiple copies of a similar command at the same time given some arguments
  • seagate GrenadaBP linux cli flasher
  • seagate "1tb/disk platform" dos flasher
  • losetup - create devices backed by a file
  • dmsetup - create devices that use one device as the original read-only backing store, and a read-write device over top
  • Zalman VE300 external virtual disk drive - USB3 external hd caddy supporting mounting ISOs as a virtual CDROM

The process so far has gone like so:

  • Buy and wait for 2 2TB WD Red replacement drives
  • Do days long burn in test on both replacements
  • Boot NAS back up
  • map devices to serial numbers, and errors from drives using dmesg, smartctl, mdadm, and udevadm
  • see that one disk has 16k reallocated sectors, copy that disk to a new WD-Red
  • shut down nas, insert WD-RED into nas, and boot back up
  • google for the right way to work on rescuing a md raid5 array after two drives have been booted.
  • find, and follow this very helpful howto. It explains a method of bringing the array back up without actually modifying any data on any of the raid members. It does so with the loopback devices, and device-mapper overlays, making it so any writes go to files on another disk, so mdraid can update superblocks, and the filesystem checker can do repairs, all without modifying the source disks. This is VERY handy if you don't have a place to store a copy of each disk. Normally people will recommend you just copy each disk to a spare if you really care about the data.
  • ?? (see below for more details)
  • PROFFIT!
  • Array came back up, and most or all of the files are there. I have no idea how many files might be corrupted at this point, but I checked a few, and they seemed fine. (I have no way of being sure, 6.5TB of files is a lot)
  • Move to working on the server array, run a simple mdadm --force --assemble, and voila, it assembled
  • check server array to see what it contains, surprise: nothing important.
  • do full read test on ALL 5 3TB seagates (yes, yes, I know...)
  • one drive errors out a few hundred MB to a few GB in
  • notice that smartctl recommends you check all 5 drives for firmware updates, also one drive has 65536 "Start Stop Count" events??? (I'll be keeping an eye on this one, that would explain the constant head parking noises)
  • ALL 5 drives have /IMPORTANT/ firmware updates.
  • grab the two firmware updates for the two different model of drives, and apply firmware.
  • re-try full read test, all reads complete successfully
  • rebuild array, and format with xfs
  • copy entire contents of nas array to backup array
  • wait 20 hours for both the copy, and the server array resync to finish
  • Go back to nas, unmount fs, stop array, undo fancy overlay, shut down
  • Apply BIOS update, because why not
  • apply firmware updates to all 5 seagate drives, one at a time, because it was easier than taking the whole thing apart to install more sata cables
  • install second WD Red
  • install OpenMediaVault (debian based nas distro)
  • rebuild array, copy files back to nas
  • currently waiting for copy to finish

I tell you what, when I got the old array back up and things seemed ok, I was SO happy.

It seems that the XFS filesystem is so fault tolerant that it can withstand quite a bit of shenanigans, as is mdraid, it will let you re-assemble an existing array even with a disk that doesn't "match" the rest of the array. You have to be careful with the "mdadm --assemble --force" command though, it will modify your drives, in particular it will update the "event count" in the mdraid superblock to match the rest of the disks, if it finds two disks it thinks are ok to re-add, it'll start resyncing (which if the data is behind enough, it'll corrupt everything). If the disk was behind enough it can and will cause corruption, just hope and pray ;) If that doesn't work, you can try "mdadm --create --assume-clean" with the same settings you created the array with to begin with, and that will give you an array that assembles, but it is very dangerous, it will assume the drives are clean and the parity matches reasonably well. If the parity or data doesn't match well enough, you are guaranteed some significant corruption, and you will have to do some more serious data recovery (ie: photorec).

During the recovery, I made a few serious mistakes:


  • re-partitioned and formatted an SSD that I thought was unused on the machine, it actually contained the external RAID write intent bitmap, and the external XFS log journal
  • saved the magic device overlay's to the newly formatted SSD
  • Grabbed testdisk to scan for old partitions. It found the old XFS log partition, and I proceeded to save the log partition TO THE SSD THAT CONTAINED IT PREVIOUSLY. Not only that, but I forgot to set the start read position when using dd. That probably overwrote some of the original log, corrupting it. When I realized that, I re-ran it and tried saving it to that SSD yet again, and this time it was actually overwriting the location i was reading from. I am pretty sure the SSD did not like that even a little. It was erroring out 100MB in or so.

In the end, I just gave XFS a blank 128MB file-backed loop device for the external log. Despite all of that, XFS was fine, and there was very few errors on mount. I will not be using an external log again :D (it can speed up performance, as can the mdraid external write intent bitmap, as it hits a different "spindle", and causes less disk thrashing)

I had some additional problems with the server. The version of systemd it uses treats ALL mounts in /etc/fstab as SUPER IMPORTANT so if any one of them fails it falls back to an emergency login prompt. That isn't honestly too terrible, but it was launching TWO of those prompts in the same console, causing the input to be split, making it nearly impossible to do anything (see attached image). I eventually booted in via changing init (init=/bin/bash) which is where I did most of the steps I did to rebuild the backup array, and then to fix the boot issue i manually told systemd to boot into emergency mode, instead of waiting for a failure (so it didn't even bother to mount the nfs share) to remove the failing entry (the nfs share on the NAS) from fstab. I can't tell you how mad that all made me last night. So pissed off. I'm getting angry just thinking about it. lol.

Hopefully someone finds this interesting at the very least, if not useful :)
<Zapata Prime> I smell Stanley... And he smells good!!!

Melbosa

Very Interesting and Good Job!
Sometimes I Think Before I Type... Sometimes!

Tom

Here's a suggestion to everyone. If you have ANY seagate drives, especially from the past few years, GO CHECK for firmware updates NOW. I think I have had 4-5 seagates fail in the past 5 years or so. a couple 1TB (if not more), a couple 2TB, and possibly one 3TB (it isn't dead yet, and im hoping the firmware update will keep it going till I can afford a replacement). Secondary to that, do not buy any 3TB seagate that was made in the past few years. 40% failure rate.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

Smart info from 4 drives that have failed the past few years:


/dev/sda   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       366
/dev/sda   5 Reallocated_Sector_Ct   0x0033   002   002   036    Pre-fail  Always   FAILING_NOW 4015
/dev/sda 183 Runtime_Bad_Block       0x0032   001   001   000    Old_age   Always       -       249
/dev/sda 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
/dev/sdd   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       27
/dev/sdd   5 Reallocated_Sector_Ct   0x0033   094   094   036    Pre-fail  Always       -       9088
/dev/sdd 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
/dev/sdd 198 Offline_Uncorrectable   0x0010   089   087   000    Old_age   Offline      -       1824
/dev/sde   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       92
/dev/sde   5 Reallocated_Sector_Ct   0x0033   086   086   036    Pre-fail  Always       -       18416
/dev/sde 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
/dev/sde 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
/dev/sdf   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       84
/dev/sdf   5 Reallocated_Sector_Ct   0x0033   072   053   036    Pre-fail  Always       -       37096
/dev/sdf 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
/dev/sdf 198 Offline_Uncorrectable   0x0010   001   001   000    Old_age   Offline      -       27720
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

Those three drives consist of one RMA replacement 1TB, and THREE 2TB seagates. One of which went last spring or last winter (cant remember now), and two that just recently failed, one before june and the other in oct. So dumb.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

Fun story, one of the 3TB seagate's that i was hoping I had saved during the firmware update process has started to give some smart errors. thank god for setting up smartd again on that box (this is the backup array in my home server, so no fancy web admin for any of this).

It spewed two errors every day for a week it seems, CurrentPendingSector's and OfflineUncorrectableSector's. They have gone away now though, and no new emails.. That was suspicious enough to make me check the drive with smartctl, and now I have 5 ReportedUncorrectableSector's So yay.

So I get to buy two new drives. woo hoo. (two because i want a spare) I was looking at BackBlaze's latest hdd roundup, and the "best" 3TB drive that I can get (that isn't a seagate) is a Toshiba. NCIX has them on sale for like $120, i might go with those, but I don't know if i should risk it as the newegg reviews are absolutely horrible. HALF of the reviews are essentially DOA, or within a year. It's that or I go with WD Reds for $150 :o which also don't have great reviews... *sigh*
<Zapata Prime> I smell Stanley... And he smells good!!!

Thorin

Are the WD Reds better than the WD Blacks?
Prayin' for a 20!

gcc thorin.c -pedantic -o Thorin
compile successful

Tom

Quote from: Thorin on April 06, 2015, 08:41:58 PM
Are the WD Reds better than the WD Blacks?
They are "meant" for NAS duties. I don't know if that truely means they are better at it than Blacks. But they will actually warranty them for NAS/Raid situations, and include TLER (time limited error recovery), which the blacks haven't supported in years (which is good for raid, as it lets the raid recover the data asap rather than waiting up to a couple minutes for the drive to attempt and potentially fail recovery).
<Zapata Prime> I smell Stanley... And he smells good!!!

Thorin

Hmm, I'm pretty sure I'm just using all WD Blacks in the Drobo.  And it's been humming along for several years now.
Prayin' for a 20!

gcc thorin.c -pedantic -o Thorin
compile successful

Tom

Quote from: Thorin on April 06, 2015, 09:16:50 PM
Hmm, I'm pretty sure I'm just using all WD Blacks in the Drobo.  And it's been humming along for several years now.
You can get away without TLER, especially if the machine doesn't use RAID. Blacks are quite good though, and if they are old enough, they may just have TLER available (they did for a long time, till people actually started buying them instead of their higher priced enterprise drives!).
<Zapata Prime> I smell Stanley... And he smells good!!!

Thorin

Yay, I bought something good without really realizing it and without intending to!

I just, I'd read about the Caviar Greens, or whatever they're called, and how they weren't meant for always-on usage.  Which just seemed completely counter to what I'd want my hard drive to be designed for...

Oh, and hopefully you get some drives that work better for you.
Prayin' for a 20!

gcc thorin.c -pedantic -o Thorin
compile successful

Tom

Quote from: Thorin on April 06, 2015, 09:52:21 PM
Yay, I bought something good without really realizing it and without intending to!
Well, WD Blacks have always been intended for "enthusiasts". They are WDs high end consumer line. Something you'd put in a workstation or game rig back in the day (now you'd just use SSDs :o)

Quote from: Thorin on April 06, 2015, 09:52:21 PM
I just, I'd read about the Caviar Greens, or whatever they're called, and how they weren't meant for always-on usage.  Which just seemed completely counter to what I'd want my hard drive to be designed for...
They are meant for regular consumer work loads, which means they are only on for like 4-6 hours a day. I have two greens still, but I'm suspicious of at least one of them.. I think I'd only use them for stuff that is mostly idle (ie: not a nas device, or linux where it likes to keep the disk on 24/7).

Quote from: Thorin on April 06, 2015, 09:52:21 PM
Oh, and hopefully you get some drives that work better for you.
I'm going with the WD Reds. It was that or the HGST NAS drives from newegg, but i prefer dealing with ncix atm when i have to worry about returns..
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

I just had another Seagate die in my NAS. Not even a plain old URE or read/write error. It just fell off the bus. I'll be messing with it to figure out what went wrong :o luckily I have a spare for that box.
<Zapata Prime> I smell Stanley... And he smells good!!!