May 4th, 2007
Comprehensive RAID performance report
Why RAID10 (0+1) superiority is a myth
As much as we love raw sequential throughput, it’s almost worthless for most database applications and mail servers. Those applications have to deal with hundreds or even thousands of tiny requests per second that are accessed at random on the hard drives. The drives literally spend more than 95% of the time seeking data rather than actually reading or writing data, which means that even infinite sequential throughput would solve only 5% of the performance problem at best. The use of extent-level striping in MS SQL Server allows even distribution of data and workload. The kind of storage performance that matters most for these applications is I/O (Input/Output) transactions per second, and it heavily favors read performance over write at a typical ratio of 80:20.
The widely accepted assumption in the storage world has been that RAID10 (or 0+1) is the undisputed king of the hill when it comes to I/O performance (barring RAID0 write I/O performance because of unreliability in RAID0), and anyone questioning that assumption is considered almost a heretic within many IT circles. This is all based on the assumption that applications are incapable of using more than one storage volume at a time or that it shouldn’t be done.
In my last career before I became an IT blogger last year with ZDNet and TechRepublic, I was an IT consultant, and storage engineering was part of my job. I worked with a Fortune 100 company that used SAP with Microsoft SQL Server 2000 on the backend. The SQL transaction times were getting so slow that they even considered building a whole new database server. I looked at the performance data and saw that the CPU never went above 10% utilization, and memory was nowhere near capacity. The choke point was the storage subsystem, which is almost always the culprit in database applications.
The storage subsystem was a high-performance SAN using a 20-drive RAID10 array comprising 10K RPM native fiber channel hard drives on a 1-gigabit FC (fiber channel) SAN. The knee-jerk assumption was that an upgrade to 2-gigabit would solve the performance problem, but I offered a non-conventional solution. The storage industry now pushes 4-gigabit FC SAN because that’s the easy number to market. I knew that even during peak loads during the day, the raw throughput on the database server never exceeded 200 mbps, let alone one gigabit. The problem was the use of RAID10. I suggested using independent sets of RAID1, which was hard for the team to swallow, and it took some time for me to convince them to try it. It went against all conventional wisdom, but I was lucky to have a boss who trusted me enough to try it, and it was my neck on the line.
I replaced the massive 20-drive 10K RAID10 array with 8 pairs of RAID1 consisting of 16 15K RPM drives. The new 15K RPM drives had roughly a 10% I/O performance advantage over the 10K RPM drives they were replacing, but there were 20% fewer of the newer drives — which meant that drive speed was more or less a wash. The biggest difference would be the RAID configuration. Microsoft SQL Server fortunately permits seamless distribution of its data tables among any number of volumes you can throw at it. The use of row-level extent-level striping in MS SQL Server allows even distribution of data and workload across any number of volumes, and I used that simple mechanism to distribute the database over the 8 pairs of RAID1.
[Update 7/28/2007 - Microsoft has corrected me that it's extent-level striping instead of row-level striping. An extent is a 64KB block of data which is the smallest unit of data.]
As a result of this change in RAID configuration, the queue depth (the number of outstanding I/O transactions backed up due to storage congestion) dropped a whopping 97%! This in turn resulted in a massive reduction in SQL transactions from a painfully slow 600ms per transaction to 200ms per transaction. The result was so dramatic that it was hard for everyone to believe it during the first week. They had thought that perhaps there was some mistake or anomaly and that this might be due to a drop in usage, but doubts subsided as time went on and the performance kept up. Even the engineer from the storage vendor who initially doubted the effectiveness of this solution became a believer after he ran some tests to verify that the load evenly distributed across the 8 RAID1 pairs.
But even with this success under my belt, it was always an uphill battle to convince other companies and their IT staff to consider using independent RAID1 volumes over RAID10. A big part of the problem was that Oracle lacked the ability to seamlessly split a table evenly over multiple volumes. It was still possible to divide up the location of the hundreds of tables that made up a typical database, but it required manual load measurements and manual distribution, which is something that many DBAs (database administrators) refused to do. It also required annual maintenance and a redistribution of tables because workloads change over time. Without extent-level striping, it becomes a question of whether the DBA and IT department want to deal with the additional management overhead.
For something like a Microsoft Exchange Server, you’re required to have multiple mail stores anyway, so having multiple RAID1 volumes fits naturally into an Exchange environment. Too many Exchange administrators follow the RAID10 advice, and it results in a massive waste in performance.
The other major obstacle I had to overcome was the fact that most storage consultants believe in RAID10 (or even RAID5, which is horrible on write I/O performance) because this was conventional wisdom and they weren’t in the mood to listen to my heresy. So instead of trying to argue with them, I’ll just present the following quantitative analysis comparing the various types of RAID.
<Next page - Performance comparison of various RAID type>
George Ou is Technical Director of ZDNet. See his full profile and disclosure of his industry affiliations.






