On GameSpot: Courtney Love to sue over Guitar Hero 5
BNET Business Network:
BNET
TechRepublic
ZDNet

July 24th, 2007

Sorry about your broken RAID 5

Posted by Robin Harris @ 12:45 pm

Categories: Disk drives, RAID

Tags: Disk, Industry, RAID, RAID 5, Unrecoverable Read Error, Robin Harris

You didn’t know?
I’m sorry I was the one to tell you that RAID 5 is broken today and will be well and truly broken in 2009 (see Why RAID 5 stops working in 2009), but somebody had to do it. The good news is that the industry is ahead of you developing solutions.

I found the negative response to my last post on the unrecoverable read error (URE) issue fascinating. A number of informed people commented, correcting my math - I took 2 statistics courses in grad school, but that was a long time ago - and taking issue with some of my arguments. All good.

What was interesting to me was that my post didn’t say anything that people in the industry haven’t known for years. For example, this Intel white paper published last year:

Intelligent RAID 6 Theory Overview And Implementation

RAID 5 systems are commonly deployed for data protection in most business environments. However, RAID 5 systems only tolerate a single drive failure, and the probability of encountering latent defects [i.e. UREs, among other problems] of drives approaches 100 percent as disk capacity and array width increase.

Every engineer in the RAID business knows this. So a) why don’t technically-oriented ZDnet readers and b) why the emotional response to a statistical argument grounded on drive vendor’s own specs?

Misplaced faith in RAID
Beyond the issues with my communication skills I saw several themes:

  • My RAID works great (and therefore always will?)
  • Sensationalism, hype and I don’t believe you. La-la-la-la-la!
  • Power factors always surprise people.

It reminded me of a comment from a SOHO/SMB RAID designer a few months back:

I was a big proponent of RAID until I found that our customers were placing so much faith in RAID that they were putting all their data on the NAS and then _deleting_ it from ALL other locations. In many cases, they had no off-site storage strategy for their data.

Array vendors take this seriously
Regular readers know I’m not a fan of the array vendors. I’m critical of an architecture where the raw disk capacity comprises only 10% of the cost of a “solution.” I believe there are better ways to protect data economically.

Yet industry engineers do take data availability and integrity very seriously. They see most problems well before customers because they are working with the largest population of equipment.

That’s why almost every vendor offers some version of RAID 6 to protect against double errors. Even with enterprise disks whose smaller capacity and 10^15 error rate make data loss from a disk failure + URE much less likely (10^15 is 1 URE every 125 TB). RAID 6 is often recommended because in mission-critical environments even a 1% chance of an array read error after a disk failure is often too great.

The industry isn’t stopping there
Some other initiatives include

  • 4K sectors - Drive vendors have been lobbying OS vendors for years to raise the block size from 512 bytes to 4KB, which enables more robust ECC without a big capacity hit. Word is that Microsoft is might actually, maybe, do it. Next time you see Ballmer, ask him about it. Why wait for Apple to do it first?
  • Many arrays do background sector scrubbing, looking for sectors with currently recoverable read errors and either rewriting and/or removing them before they cause a problem.
  • NAS boxes that virtualize disks as a pool of blocks can combine their file system knowledge to enable data redundancy on a per-file basis for greater availability. A URE on an unused block isn’t a problem since the NAS file system knows what blocks are in use and which aren’t.
  • Advanced file systems like ZFS, which combine file system and volume management functionality, can combine their parity data with parent-block checksums to perform “. . . combinational reconstruction of a RAID set.” (Thanks, Joerg!)

That list just scratches the surface of all the work the industry is doing to ensure data availability and integrity as disk drives continue their capacity growth. RAID 5 is reaching its end of life, but your data can still be safe despite that.

Comments welcome, as always. Industry folks, what else is happening to manage this issue>

Robin HarrisRobin Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small. See his full profile and disclosure of his industry affiliations.


Email Robin Harris

Subscribe to Storage Bits via Email alerts or RSS.

  • Talkback
  • Most Recent of 24 Talkback(s)
RE: Sorry about your broken RAID 5
Just because you incorporated your e-bay business, doesn't mean you should suggest you have corporate experience, which implies to some people that you work in big business, such as the Fortune 500. (Read the rest)
Posted by: rdupuy11 Posted on: 05/07/09 You are currently: a Guest | | Terms of Use
Still trying to understand your argument  Yagotta B. Kidding | 07/24/07
Problem = Disk failure + URE  R HarrisZDNet Moderator | 07/25/07
Maybe you should title yourr blogs better!!!!...  mrlinux | 07/25/07
And the limitation is . . .  R HarrisZDNet Moderator | 07/25/07
Maybe using this has a basis for your title...  mrlinux | 07/25/07
Title coloring?  Jim888 | 07/25/07
You help make my point...  mrlinux | 07/25/07
Crucial difference!  shraven | 07/26/07
Please stop confusing RAID5 and offsite backup as mutually exclusive  georgeou | 07/25/07
Can you be more specific?  R HarrisZDNet Moderator | 07/27/07
I said RAID5 is appropriate for files that don't need backup  georgeou | 07/27/07
Say what you mean and stop whining.  shraven | 07/26/07
Why don't YOU drive a Model T?  R HarrisZDNet Moderator | 07/26/07
One corroborating opinion...  RAIDGuy | 07/25/07
I couldn't have said it better myself- Thank You!  R HarrisZDNet Moderator | 07/26/07
Good try, but no cigar  - bill | 07/28/07
A couple of comments  R HarrisZDNet Moderator | 07/28/07
Hmmm  - bill | 08/01/07
Agree - conditionally...  RAIDGuy | 07/30/07
Still no cigar, I'm afraid  - bill | 08/01/07
A reference  RAIDGuy | 08/13/07
RE: Sorry about your broken RAID 5  guspaz | 10/23/08
RE: Sorry about your broken RAID 5  bikaron | 11/10/08
RE: Sorry about your broken RAID 5  rdupuy11 | 05/07/09

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads

  • Smart Tech Expert advice on innovations in healthcare and the green technologies that make it happen. Find out more
  • Smart Business Discussion and advice on management issues that revolve around making your world smarter and more useful. More Smart Advice
  • Smart People The best and worst moves in the management and strategy trenches. Learn More