Starting the new year off in an appropriate manner (Murphy's Law at work)
Wednesday, January 2, 2013 at 08:44PM
wesperdue

In the IT world Murphy's Law (if anything can go wrong, it will) rules, and as irony seems to be the fad du jour, so it seems I'm starting off 2013 in the most right and proper manner–with a terrific example of Murphy's Law at work in my IT world.

I've been off for a week and a half, as my company had shut down for the holiday. Of course, servers don't like to be shut down for long periods at a time, so the servers I manage didn't get a holiday.

Earlier this year, I had a system with a 4-disk RAID 5 array lose two hard drives over the weekend. RAID 5 only has enough redundancy to handle the loss of one disk at a time without data loss. This reinforced my belief that for four or more disks, RAID 5 should not be used.

I've not deployed RAID 5 in at least the last two years. I've been using RAID 10 (i.e. RAID 0+1, striping and mirroring) for quite some time. It allows for up to half of its disks to fail before data loss, providing the disks are not both mirrors of each other.

I got a case at 8 o'clock this morning about a host that isn't responsive. I check its hardware monitor, and it's got two drives down, out of four. According to my understanding of the controller, the disks are in the same stripe and their mirrors are fine. I replace the disks, but a rebuild doesn't start. I check the controller, and it doesn't recognize any disks now. I can't get the host to recognize anything.

I started rebuilding the host from scratch before lunch; it's finishing up now. I now understand that disk groups are mirrors. RAID 10 volumes span (i.e. stripe) down the disk groups. I seem to learn something new every day; that's why I love my job.

What's the moral of this story? Hard drives will die, and it's very difficult to completely prevent data loss. Back ups are essential, unless you can completely re-create a system, as we do.

Make certain you have a good backup system. Don't depend on redundant storage.

Article originally appeared on Wes Perdue's Journal of Geekery (http://wesperdue.net/).
See website for complete article licensing information.