On UrbanBaby: Do modern parents try too hard?
BNET Business Network:
BNET
TechRepublic
ZDNet

May 28th, 2007

RAID storage explained

Posted by George Ou @ 5:48 pm

Categories: Desktop, Hardware, Servers, Storage

Tags: Hard Drive, Throughput, Capacity, Array, Storage, Data, RAID, Parity, George Ou

 This information is also available as a PDF download.

Since I’ve been doing a lot of coverage of storage technology both for the enterprise and for the home lately, I thought I should give an explanation of what RAID storage is. I won’t go in to every RAID type under the sun, I just want to cover the basic types of RAID and what the benefits and tradeoffs are.

RAID was originally defined as Redundant Array of Inexpensive Drives, but RAID setups were traditionally very expensive so the definition of “I” became Independent. The costs have recently come down significantly because of commoditization and RAID features are now embedded on to most higher-end motherboards. Storage RAIDs were primarily designed to improve fault tolerance, offer better performance, and easier storage management because it presents multiple hard drives as a single storage volume which simplifies storage management. Before we start talking about the different RAID types, I’m going to define some basic concepts first.

Fault tolerance defined:
Basic fault tolerance in the world of storage means your data is intact even if one or more hard drives fails. Some of the more expensive RAID types permit multiple hard drive failures without loss of data. There are also more advanced forms of fault tolerance in the enterprise storage world called path redundancy (AKA multi-path) which allows different storage controllers and the connectors that connect hard drives to fail without loss in service. Path redundancy isn’t considered a RAID technology but it is a form of storage fault tolerance.

Storage performance defined:
There are two basic metrics of performance in the world of storage. They are I/O performance and throughput. In general, read performance is more valued than write performance because storage devices spend the majority of their time reading data. I/O (Input/Output) performance is the measure of how many small random read/write requests can be processed in a single second and it is very important in the server world, especially database type applications. IOPS (I/O per second) is the common unit of measurement for I/O performance.

Throughput is the measurement of how much data can be read or written in a single second and it is important in certain server applications and very desirable for home use. Throughput is typically measured in MB/sec (megabytes transferred per second) though mbps (megabits per second) is sometimes also used to describe storage communication speeds. There is sometimes confusion between megabits versus megabytes since they sound alike. For example, 100 megabit FastEthernet might sound faster than a typical hard drive that gets 70 MB/sec but this would be like thinking that 100 ounces weighs more than 70 pounds. In reality, the hard drive is much faster because 70 MB/sec is equivalent to 560 mbps.

RAID techniques defined:
There are three fundamental RAID techniques and the various RAID types can use one or more of these techniques. The three fundamental techniques are:

  • Mirroring
  • Striping
  • Striping with parity

Mirroring:
Data mirroring stores the same data across two hard drives which provides redundancy and read speed. It’s redundant because if a single drive fails, the other drive still has the data. It’s great on read I/O performance and read throughput because it can independently process two read requests at the same time. In a well implemented RAID controller that uses mirroring, the read IOPS and read throughput (for two tasks) can be twice that of a single drive. Write IOPS and write throughput aren’t any faster than a single hard drive because they can’t be process independently since data must be written to both hard drives at the same time. The downside to mirroring is that your capacity is only half of the total capacity of all your hard drives so it’s expensive.

Striping:
Data striping distributes data across multiple hard drives. Striping scales very well on read and write throughput for single tasks but it has less read throughput than data mirroring when processing multiple tasks. A good RAID controller can produce single-task read/write throughput equal to the total throughput of each individual drive. Striping also produces better read and write IOPS though it’s not as effective on read IOPS as data mirroring. You also get a large consolidated drive volume equal to the total capacity of all the drives in the RAID array. Striping is rarely used by itself because it provides zero fault tolerance and a single drive failure causes not only the data on that drive to fail, but the entire RAID array. Striping is often used in conjunction with data mirroring or with parity.

Striping with parity:
Because striping alone is so unreliable in terms of fault tolerance, striping with parity solves the reliability problem at the expense of some capacity and a big hit on write IOPS and write throughput compared to just data striping. Data is striped across multiple hard drives just like normal data striping but a parity is generated and stored on one or more hard drives. Parity data allows a RAID volume to be reconstructed if one (sometimes two) hard drives fail within the array. Generating parity can be done in the RAID controller hardware or done via software (driver level, OS level, or add-on volume manager) using the general purpose processor. The hardware method of generating parity either results in an expensive RAID controller and/or poor throughput performance. The software method is computationally expensive though that’s no longer a problem with fast multi-core processors. Despite the performance and capacity penalty of using parity, parity uses up far less capacity than data mirroring while providing drive fault tolerance making this a very cost-effective form of reliable large-capacity storage.

<Next page - Basic RAID Levels defined>

Pages: 1 2

George Ou is Technical Director of ZDNet. See his full profile and disclosure of his industry affiliations.

  • Talkback
  • Most Recent of 18 Talkback(s)
more about RAID Level zero
I wrote a blog posting about Raid Zero and how a couple people
were using it without being aware of the dangers.

Read the rest)
Posted by: Michael Horowitz Posted on: 08/28/07 You are currently: a Guest | | Terms of Use
RAID water cooler  justgold79@... | 05/28/07
Thanks, I've seen that before - nt  georgeou | 05/28/07
Go Stuff  D. T. Schmitz | 05/28/07
Thanks, we'll get a PDF for TechRepublic tomorrow sometime  georgeou | 05/28/07
Thanks George  DanLM | 05/29/07
Any thing else I can help explain better?  georgeou | 05/29/07
Thanks George  kmatzen@... | 05/29/07
OS and Data should always be separated  georgeou | 05/29/07
Separation  majoritywhip | 05/29/07
"brief case"?  kmatzen@... | 05/29/07
It's an old syncronization feature from Win9x days  georgeou | 05/29/07
RAID History  earlkaplan@... | 06/02/07
Vendors prefer independent so they can sell 500-1000 percent premiums  georgeou | 06/04/07
Inexpensive vs Independant  TG2 | 06/06/07
Nice explanation  Freebird54 | 05/29/07
RAID Level 1 mirroring will enhance boot times  georgeou | 05/29/07
Love it  Mercutio_Viz | 05/30/07
more about RAID Level zero  Michael Horowitz | 08/28/07

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

Top Rated

    Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
    advertisement
    Click Here

    Archives

    ZDNet Blogs

    White Papers, Webcasts, and Downloads