What is RAID? The concept of RAID, or Redundant Array of Independent Disks, was originally discussed in a Berkeley paper by Patterson, Gibson and Katz. The idea is that instead of writing data block by block over a single disk, the data is spread over several spindles. This gives performance benefits, as data is read off several spindles, and availability benefits, as extra parity data can be generated and stored, so that the data will still be available if one or more disks are lost.
Parity is a means of adding extra data, so that if one of the bits of data is deleted, it can be recreated from the parity. For example, suppose a binary halfword consists of the bits 1011. The total number of '1's in the halfword is odd, so we make the parity bit a 1. The halfword then becomes 10111. Suppose the third bit is lost, the halfword is then 10?11. We know from the last bit that there should be an odd number of '1's, the number of recognisable '1's is even, so the missing but must be a '1'. This is a very simplistic explanation, in practice, disk parity is calculated on blocks of data using XOR hardware functions. The advantage of parity is that it is possible to recover data from errors. The disadvantage is that more storage space is required. In enterprise disk subsystems, backup disks called 'dynamic spares' are kept ready, so that when a disk is lost, a dynamic spare disk is automatically swapped in and the faulty disk is rebuilt from the remaining data and the parity data.
There has been some speculation in recent years that RAID is no longer relevant. This is based on the fact that disk are now much bigger then they were when RAID was invented and so it takes much longer to swap a dynamic spare in. Why is this important? Well until the data is rebuilt there is no protection, so if another disk failed, all the data would be lost. With physical disks sizes about to reach 16TB, that means in a RAID5 7+1 configuration, 112 TB of data could be lost. In that same configuration, the disk controller would have to read 112 TB of data to rebuild the missing 16TB and that could take days if the system is busy. Also, while the controller is performing this recovery, performance will be affected when data on the rest of the raid group is accessed.
On the plus side, RAID performance is evolving. Access to the disks does get faster as disks get bigger, and the hardware functions that are used to rebuild the data are constantly being improved. This means that as each new generation of disks arrive, the rebuild performance of RAID controllers improves and keeps pace with the drive capacity increase.
The conclusion is that RAID is not dead yet, nor is it likely to be for some time.
So which RAID configuration is best? RAID1 is simple to implement and performs well. Rebuild time after a failure is quite fast as just one disk is read. It is more expensive for large disk subsystems as twice the amount of disk is needed for a given capacity. Probably the best solution for small configurations and especially home PCs.
RAID5 is still commonly used in enterprise subsystems, and performs well if it is fronted by a large cache to mask the RAID5 write performance overhead. You will be able to store more data for a given number of physical disks with RAID5, for example the RAID1 storage overhead is 50%, but a 7+1 RAID5 configuration has just a 12.5% overhead. To put this into specific numbers based 8 2TB disks, RAID1 will give you 8TB usable, whereas RAID5 will give you 14TB usable. On the minus side, RAID5 is more complex to implement, it will take much longer to rebuild a failed disk, and if two failures do happen, you will lose much more data. RAID5 is still a good solution for enterprise disks but as disks continue to get cheaper, it may be worth paying the extra money for RAID1 to avoid the double failure risk.
The various types of RAID are explained below. In the diagrams, the square box represents the controller and the cache.
The gif below illustrates the RAID5 write overhead. If a block of data on a RAID5 disk is updated, then all the unchanged data blocks from the RAID stripe have to be read back from the disks, then new parity calculated before the new data block and new parity block can be written out. This means that a RAID5 write operation requires 4 IOs. The performance impact is usually masked by a large subsystem cache.
As Nat Makarevitch pointed out, more efficient RAID-5 implementations hang on to the original data and use that to generate the parity according to the formula new-parity = old-data XOR new-data XOR old-parity. If the old data block is retained in cache, and it often is, then this just requires one extra IO to fetch the old parity. Worst case it will require to read two extra data blocks, not four.
RAID 5 often gets a bad press, due to potential data loss on hardware errors and poor performance on random writes. Some database manufactures will positively tell you to avoid RAID5. The truth is, it depends on the implementation. Avoid software implemented RAID5, it will not perform. RAID5 on smaller subsystems will not perform unless the subsystem has a large amount of cache. However, RAID5 is fine on enterprise class subsystems like the EMC DMX, the HDS USP or the IBM DDS devices. They all have large, gigabyte size caches and force all write IOs to be written to cache, thus guaranteeing performance and data integrity.
Most manufactures will let you have some control over the RAID5 configuration now. You can select your block stripe size and the number of volumes in an array group.
A smaller stripe size is more efficient for a heavy random write workload, while a larger blocksize works better for sequential writes. A smaller number of disks in an array will perform better, but has a bigger parity bit overhead. Typical configurations are 3+1 (25% parity) and 7+1 (12.5% parity).
The problem with RAID6 is that there is no standard method of implementation; every manufacturer has their own method. In fact there are two distinct architectures, RAID6 P+Q and RAID6 DP.
DP, or Double Parity raid uses a mathematical method to generate two independent parity bits for each block of data, and several different mathematical methods are used. P+Q generates a horizontal P parity block, then combines those disks into a second vertical RAID stripe and generates a Q parity, hence P+Q. One way to visualise this is to picture three standard four disk RAID5 arrays then take a fourth array and stripe again to construct a second set of raid arrays that consist of one disk from each of the first three arrays, plus a fourth disk from the fourth array. The consequence is that those sixteen disks will only contain nine disks worth of data.
P+Q architectures tend to perform better than DP architectures and are more flexible in the number of disks that can be in each RAID array. DP architectures usually insist that the number of disks is prime, something like 4+1, 6+1 or 10+1. This can be a problem as the physical disks usually come in units of eight, and so do not easily fit a prime number scheme.
back to top