Finally have a use case where I'd actually benefit from a raid cache :o

Started by Tom, February 13, 2017, 08:20:39 AM

Previous topic - Next topic

Tom

I've been mucking with CI for myself, and for work. On my local box, I've run into a bit of a IOPs bottleneck :(

I've got it set up to build multiple docker images in parallel. Not only are they installing quite a few large things, but docker seems to be pretty inefficient when building images. TONS of random accesses and tar commands... Just how it works I suppose.

Linux md has gotten new support for a raid5 cache, which can run in two modes, write-through, and write-back. former will write data to both the cache disk and the array before returning an OK, and the latter will return OK once its comitted to the cache. assuming a fast enough cache disk, and battery (and on board capacitor storage on the ssd) backup, data loss from a ungraceful shutdown shouldn't happen too often  in write-back mode.

Being that I'm broke, and we're trying to save, I definitely won't be getting any cache disks for a while. So I'm going to try a raid10-far config. Right now the lvm volume group is defraging itself, and then I'll shrink it, and convert the raid5 to a raid10-far using the online conversion support in md raid :D

My poor poor 4x1TB raid5 that I use for VM disks on my big server is struggling pretty hard :(
<Zapata Prime> I smell Stanley... And he smells good!!!

Melbosa

Good luck!  At least you only dealing in the 4 disc scenario.  I've seen our large SANs relevel arrays that took weeks to a month to complete.
Sometimes I Think Before I Type... Sometimes!

Tom

Quote from: Melbosa on February 13, 2017, 09:19:34 AM
Good luck!  At least you only dealing in the 4 disc scenario.  I've seen our large SANs relevel arrays that took weeks to a month to complete.
Yeah, I've had a big raid5 or 6 array take days to reshape/repair.

It'd probably be faster if I just slapped some disks on there, and coppied things over. but meh. we'll see how long this lvm defrag takes.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

Ok, so I lied. I just slapped two old 1TB drives in there as a raid0, added it to the vm0 LVM volume group (where vm volumes go), and told LVM to move all data off the old raid5.

This will take significantly less time. The old pvmove I did was doing a bunch of copying and causing all kinds of seeking and random access. Since I posted first it got to like 25%? Felt like it was taking too long. The new process will happen a lot quicker. It's going at a nice 50-130MB/s vs the old like ~10-20MBps. Already at 3% and I just started it.

Once that's done, I'll remove the old md0 raid5 array. Maybe (maybe not...) replace them with the 2tb drives and create a new raid10-far array for the vms to live on. Should give (near) the performance of raid0 with the security of raid1. It's a special raid10 too, its integrated, so theres no raid0+1 or raid1+0 layering going on and is directly and easily extendable.

I was going to set up my little external SAS enclosure, but the cable I have for it is /so/ short, and I can't reorganize things atm such that the enclosure will fit close enough to the big vm server for that to work. :(

Ah well.

p.s. comming up on 5% now :D
<Zapata Prime> I smell Stanley... And he smells good!!!

Melbosa

Sometimes I Think Before I Type... Sometimes!

Tom

Thanks :)

In the end I hope I notice a real difference. But that raid5 read modify write penalty is pretty harsh, so a raid10 should help a lot for small read/write's and iops in general.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

Dang. Just created the raid10-f2 array, and it'll be another 11~ hours before it's done resyncing. Wish I could tell it to skip that step and assume everything is zeros, but that'd require logic it just doesn't have (keeping track of every single block and which one has been "allocated" once already).

I technically could just zero all four drives an "--assume-clean" when I create, but meh. This'll be done in the morning tomorrow so I'm not too worried.
<Zapata Prime> I smell Stanley... And he smells good!!!

Melbosa

Sometimes I Think Before I Type... Sometimes!

Tom

No kidding. Not quite finished yet. 84%. another couple hours to go at least. but as we all know, spinning rust gets slower the further you go.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

And now pvmove'ing the old raid0 over to the new raid10. woo. probably be a while yet again lol.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

<Zapata Prime> I smell Stanley... And he smells good!!!

Lazybones

Quote from: Tom on February 14, 2017, 05:06:38 PM
Man, it was going pretty good, except now its down to 9MB/s :(
If you are using spinning rust the speed will depend on if it is reading mostly inner or outer tracks as well as how much seeking is needed.

Tom

Quote from: Lazybones on February 14, 2017, 05:37:42 PM
Quote from: Tom on February 14, 2017, 05:06:38 PM
Man, it was going pretty good, except now its down to 9MB/s :(
If you are using spinning rust the speed will depend on if it is reading mostly inner or outer tracks as well as how much seeking is needed.
Yeah, the inner track issue wouldn't slow it down this much. I'm not sure why It'd drop this low. there should be little if any random seeking involved.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

So wow, performance on this raid10 array is ATROCIOUS. I don't exactly know why its so bad. Its worse than single disk performance for sequential writes. Going to have to spend some time figuring that out.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

God damn it. Smartmontools wasnt installed on bender. (my big vm box). And all four of those drives in raid10 have reallocated sectors, especially one of them... I'm going to have to swap them out. gah.

Who knew, all these 1TB seagates would fail. ::) course they aren't meant for 24/7 operation, but still :o 56600 hours isn't a lot is it? ;) 6 years is fine for a consumer drive isn't it? j/k. Ok one of them is only nearing 20k hours.... but yeah, I think I need to swap out at least three of the drives for newer ones in my stock pile, and only use these ones for like temp storage or tertiary backups.
<Zapata Prime> I smell Stanley... And he smells good!!!

Tom

Ok, so I found a couple newer 2TB drives that seem to be ok, and have temporarily set up a raid0, and am moving the vm volumes off the old disks. Eventually I'll have to get some proper WD Black or ES style drives for it.
<Zapata Prime> I smell Stanley... And he smells good!!!

Lazybones

If you are dedicating a box to NAS why not use a dedicated distribution that has all the monitoring / SMART functions configured / exposed almost out of the box?

Tom

Quote from: Lazybones on February 26, 2017, 06:25:24 PM
If you are dedicating a box to NAS why not use a dedicated distribution that has all the monitoring / SMART functions configured / exposed almost out of the box?
The drives in question are my old NAS drives, that had been living in a storage container for quite a while, especially the 1TB drives I just yanked. I put them in the VM box as vm storage.

The nas takes care of itself pretty good, but the kvm box not so much. :(

I've been thinking about serving some vm storage from the NAS though, but I don't know if I really want to do that. I like not having machines be too coupled together. If the NAS needs a reboot, its ok, nothing else gets bothered most of the time. If it turns into a iSCSI or NFS boot host, then it going down would bug a lot of things.

At some point I want to have some storage duplicated/shared across my two vm servers. But I'd want to use proper drives for that, with a proper setup using glusterfs or DRBD+clvm or something like that.
<Zapata Prime> I smell Stanley... And he smells good!!!