What is RAID? The concept of RAID, or Redundant Array of Independent Disks, was originally discussed in a Berkeley paper by Patterson, Gibson and Katz. The idea is that instead of writing data block by block over a single disk, the data is spread over several spindles. This gives performance benefits, as data is read off several spindles, and availability benefits, as extra parity data can be generated and stored, so that the data will still be available if one or more disks are lost.
Parity is a means of adding extra data, so that if one of the bits of data is deleted, it can be recreated from the parity. For example, suppose a binary halfword consists of the bits 1011. The total number of '1's in the halfword is odd, so we make the parity bit a 1. The halfword then becomes 10111. Suppose the third bit is lost, the halfword is then 10?11. We know from the last bit that there should be an odd number of '1's, the number of recognisable '1's is even, so the missing but must be a '1'. This is a very simplistic explanation, in practice, disk parity is calculated on blocks of data using XOR hardware functions. The advantage of parity is that it is possible to recover data from errors. The disadvantage is that more storage space is required. In enterprise disk subsystems, backup disks called 'dynamic spares' are kept ready, so that when a disk is lost, a dynamic spare disk is automatically swapped in and the faulty disk is rebuilt from the remaining data and the parity data.
Flash disks have changed the data storage picture in recent years, but what difference have they made to RAID? Traditional RAID on spinning disks performs two functions. Data was protected from hardware failure and when striped over multiple disks, data can be retrieved faster from multiple spindles that from a single spindle.
This second function does not apply to Flash storage as it has no mechanical components and so data can be read from parts of a Flash storage device in parallel.
The first function also differs. When a spinning disk fails, the entire drive is lost with all the data on it. When a Flash drive fails, usually only parts of the drive fail and the drive itself has spare capacity to replace some of the faulty areas. RAID protection then is not so important with flash drives.
So what has happened to RAID in SSD storage devices? Well, if a manufacturer just replaced its spinning disks with Flash devices, then they kept the RAID configuration the same, typically with RAID1, RAID5 or RAID6 options. The difference has come from those vendors who have built new devices specifically for Flash/SSD storage. They have tended to take a different approach to data protection. For example:
Pure Storage recognised that because of the way SSDs are built, the failure hotspots tend to happen in localised areas of the SSDs. The reason for this is that when an SSD goes through an erase-then-write operation, that wears the non-conducting internal substrate. This means that the bit error rate of the Flash device goes up as the device ages. Pure Storage RAID 3D is designed to check for these bit errors and correct them without needed to swap out the entire drive. They also use independent checksums and dedicated parity to correct the bit errors.
NetApp's SolidFire all-flash array uses what they call a post-RAID distributed replication algorithm. This solution spreads redundant copies of data for single disk throughout all the other disks in the cluster. If a disk fails, the IO load it was serving spreads out evenly among all the remaining disks in the system, and the data is rebuilt in parallel to the free space on all remaining disks rather than to a dedicated spare drive, allowing for rebuilds in a matter of seconds or minutes rather than hours or days.
It could be then as these newer Flash storage systems develop, that traditional RAID is no longer relevant.
So, if you still use traditional RAID, which configuration is best? RAID1 is simple to implement, performs well and is probably the best solution for small configurations and especially home PCs. RAID6 is usually preferred for enterprise subsystems, especially if they use large disks.
RAID1 can only tolerate 1 disk failure, but as the RAID protection can be restored by reading just one disk, the risk of data loss is low, especially if the disks are relatively small. The issue with RAID 1 is that only half the installed capacity is usable.
RAID5 can also only tolerate one failure, and a rebuild can take some time for large disks, so increasing the chance that a second disk might fail and so lose all the data.
RAID6 can tolerate 2 disk failures, so when a disk fails, 2 more need to fail during rebuild time, before data is lost. The RAID overhead depends on how many disks are in the RAID rank. The overhead is 25% for an 8 disk array.
The various types of RAID are explained below. In the diagrams, the square box represents the controller and the cache. Blue and yellow blocks represent data and red blocks represent parity. For simplicity, the dynamic diagrams show each IO as a RAID block. In practice, RAID blocks are fixed size, and so IOs are split into RAID blocks as appropriate. The RAID striping and parity is usually generated by ASICs.
If a block of data on a RAID5 disk is updated, then in the worst case, all the unchanged data blocks from the RAID stripe have to be read back from the disks, then new parity calculated before the new data block and new parity block can be written out. This means that a RAID5 write operation can require 2 data fetches. The performance impact is usually masked by a large subsystem cache.
More efficient RAID-5 implementations hang on to the original data and use that to generate the parity according to the formula new-parity = old-data XOR new-data XOR old-parity. If the old data block is retained in cache, and it often is, then this just requires one extra fetch for the old parity.
The problem with RAID6 is that there is no standard method of implementation; every manufacturer has their own method. In fact there are two distinct architectures, RAID6 P+Q and RAID6 DP.
DP, or Double Parity raid uses a mathematical method to generate two independent parity bits for each block of data, and several different mathematical methods are used.
P+Q generates a horizontal P parity block, then combines those disks into a second RAID stripe and generates a Q parity, hence P+Q. The GIF below shows how RAID6 could be striped over 8 disks, Those 8 disks will only contain 6 disks worth of data.
P+Q architectures tend to perform better than DP architectures and are more flexible in the number of disks that can be in each RAID array. DP architectures usually insist that the number of disks is prime, something like 4+1, 6+1 or 10+1. This can be a problem as the physical disks usually come in units of eight, and so do not easily fit a prime number scheme.