Synology DOWN!!!!!

Started by Melbosa, December 13, 2016, 05:15:47 PM

Previous topic - Next topic

Melbosa

So I had some fun with my Synology DS1812+ this week.  Had a drive where the bearings were starting to go and the drive was wobbling in the 8 bay device.  Picked up a new NAS drive to replace it and started the process of doing just that Monday night.  Well during my RAID rebuild another drive decided to throw some SMART errors of bad sectors (my raid is fault tolerant to 1 drive loss).

I know you are thinking (at least those that understand RAID systems) there goes the Superblocks and Data done!  Well not so as of last night when the rebuild finished successfully.

But then this morning Synology device wanted to Fix some errors found on the file system due to the bad sectors and wanted to reboot.  I crossed my fingers because its 18+TB of data and I can't possibly back that up somewhere else (the important bits are synced to cloud services, so at least if I lost the whole RAID of data, the important parts were backed up), and pushed the "go ahead" button.  Well it took almost all day "Checking file system" on boot up (actually was running a /sbin/e2fsck -C-1 -pvf /dev/vg1000/lv), but finished around 2PM.  But then something funny happened...

The Synology just sat there.  Responding to pings, responding to SSH, but doing nothing else.   Well I thought "guess it needs a reboot" and proceeded to do so from said SSH session....

And on bootup it started all over again "Checking file system"... WTF!!!! Is my data gone?  Am I hooped?

Well did some digging with mdadm commands, ps, and looking at some Synology scripts.  Looks like the MDs, PV, LV and VG are all good.  Looks like the SMART data on the one drive is doing the remapping right.  But damn if the Synology scripts aren't cleaning themselves up properly.  The write temp file to / to check for certain activities, one being a file called .vscan_confirmed which when the /sbin/e2fsck -C-1 -pvf /dev/vg1000/lv completes is supposed to clean up, but isn't.

So on reboot, guess what it does again? /sbin/e2fsck -C-1 -pvf /dev/vg1000/lv

Now I am just waiting for the /sbin/e2fsck -C-1 -pvf /dev/vg1000/lv to complete before I remove this 0b file and try a second reboot.  /crossingfingers it all comes back up, but like I said the MDs, PV, LV, and VG all look fine so hoping that this is all that is needed.

BTW: none of this is documented on Synology's tech docs... glad for Cova's and Mine Linux experience to save the day on this one - at least I hope.  Update to follow once I can safely try my next step.
Sometimes I Think Before I Type... Sometimes!

Darren Dirt

I was told this is an English-only forum...  :P
_____________________

Strive for progress. Not perfection.
_____________________

Mr. Analog

Quote from: Darren Dirt on December 13, 2016, 08:19:27 PM
I was told this is an English-only forum...  :P

This is Tech Chat, get out




Mel let me know how this goes, I bought all my drives at once so I figure if one fails they'll all be close to failure when the time comes
By Grabthar's Hammer

Melbosa

Quote from: Mr. Analog on December 13, 2016, 09:08:23 PM
Mel let me know how this goes, I bought all my drives at once so I figure if one fails they'll all be close to failure when the time comes
Well it all worked out as predicted!  Actually there was a second step debugfs -q /.remap.vg1000.lv /dev/vg1000/lv that has no indication that it needs to complete. So while it looks like the device is offline at that time, you can actually see it working away through top if you SSH in.

So what do I say lesson learned here?  I learned a lot more about how a Synology uses LVM and MD for its RAID, and what I would suggest to anyone with issues: Make sure you enable SSH on your Synology, and always check it's "top" and "ps -aux" commands to make sure it isn't actually trying to complete something for you before you start messing with it
Sometimes I Think Before I Type... Sometimes!

Mr. Analog

Will do! I turned off SSH when I was done running some custom scripts but after hearing your story I think I'll enable it permanently

Thanks!

Sent from my SM-T810 using Tapatalk

By Grabthar's Hammer

Tom

Just don't expose the SSH to WAN :D also think about using shared keys, not password auth.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

<Zapata Prime> I smell Stanley... And he smells good!!!

Mr. Analog

Quote from: Tom on December 14, 2016, 10:03:46 AM
Just don't expose the SSH to WAN :D also think about using shared keys, not password auth.

Duh!

I'm so paranoid I turn it on and off as needed... well until hearing this here story :)
By Grabthar's Hammer