RAID, or 'Redundant Array of Inexpensive Disks', is the process of combining multiple hard drives or SSDs in parallel as one logical volume, making the array more resistant to disk failures. There are many types of RAID and we will discuss which one to choose.
What is RAID?
Suppose you have two 1
Without RAID, there is no redundancy, but this is not the main problem. Data should never really be lost with a good backup strategy, but if you lose a disk, that server may experience severe downtime during recovery. This is not acceptable in a server environment and is much, much worse than a temporary data loss.
RAID arrays can be rebuilt while they are still usable, and if one disk fails, there is no need to restore from backups. This is the main advantage of RAID arrays. Servers are designed to never fail, even for maintenance in many cases: you can literally disconnect a drive from a production web server and it will continue to puff, albeit with lower performance.
In many ways, RAID is much better than one big drive. One large 8 TB drive is not as resilient as five 2 TB drives configured in RAID 5. It will be difficult to find a server that comes with only one drive installed.
RAID works best with identical disks. It can work with different disks, but you are usually limited to the speed and space of the slowest and smallest disk, making it less than optimal.
This whole discussion only really applies if you manage a server yourself, such as a home NAS with many hard drives; in that case, the type of RAID you choose is very important. If you rent virtual servers from AWS or actually another provider, RAID is usually configured for you by the hosting company because that level of control is removed from you.
A note before we start: the numbers used to indicate different RAID levels actually mean nothing. RAID 5 is not five times better than RAID 1. There are other strange RAID levels, such as RAID 2, 3 and 4, but they are not used in practice and are not worth explaining.
This is not technically a RAID configuration, but it is worth mentioning here. JBOD technically stands for & # 39; Joint Batch Of Disks & # 39; but you can & # 39; just set a bunch of disks & # 39; because that's actually what it is. JBOD simply merges disks into one large disk. This doesn't offer a performance boost and has no redundancy, but it doesn't matter which drives go in at all.
Many RAID controllers offer a JBOD mode. You probably shouldn't be using it unless you've got a bunch of different sized drives and want to link them together.
Data in RAID 0 is streaked across multiple disks; For example, if you want to read a file from the RAID array, you read in parallel from multiple disks, which makes RAID 0 much faster than a single disk.
However, there is no mirroring, parity or other redundancy mechanism, so if a single disk fails, you will lose all data on the entire array. Therefore RAID 0 is used when speed is important, no redundancy is needed.
In a way, RAID 0 is very similar to having no RAID at all. It gives you the advantage of having all the disks in one big volume, as well as much faster access speeds. However, a single drive failure can be catastrophic for the data on the array, so you should never run RAID 0 without a backup solution unless the data is intended to be 100% ephemeral.
RAID 0 also maximizes capacity, because no space is used for redundancy. If you have two 1 TB drives, your array size is 2 TB. However, RAID 0 is limited to the lowest disk size in the array. If you try to RAID 0 a 2 TB drive with a 1 TB drive, you will only have 2 TB of space and 1 TB will be completely lost.
RAID 0 with SSD & # 39; s is common and more reasonable considering that SSD & # 39; s have lower failure rates. This is a common installation in high-end desktop systems, as speed is more important than redundancy.
RAID 1 is another basic type of RAID. As with RAID 0, it uses two or more disks, but instead of stripping data over them, the data is mirrored from the first disk to the second (and any additional disks in the array). If you have two disks, one of them is fully used as a kind of real-time backup, halving your total storage capacity. If either disk hits the bucket, you can continue reading from the other disk and rebuild the array by replacing the failed disk.
This has some advantages for read performance, since two disks can be used, but because it reads the same data from each disk, it is often not as good as RAID 0. The write performance is limited to the speed of the slowest disk.
RAID 1 is your only practical choice if you have two disks and you have no disk error when deleting your data. However, it is not the most efficient since you cut your storage capacity in half and thus cost twice as much as comparable single drives.
However, the redundancy in a server environment is worth much more than the price of a single trip. If you only need a base station configuration, choose a simple RAID 1 array. Most RAID controllers deploy RAID 1 by default when connecting two drives.
RAID 5 is where things get interesting. Rather than duplicating data like RAID 1, RAID 5 uses a much more efficient method: parity.
Parity is a form of error checking, like a hash, but much simpler. It is often used to ensure that network traffic is not disrupted in the wires. Basically, suppose you have 7 bits of data that you want to send to someone and you want to make sure it stays completely intact there. If a bit was flipped during transmission, they could not know. The solution is to add all the positive bits; If there is an even number, the parity is
0 . If there is an odd number, the parity is
1 . You add this to the data you send and when the person on the other side receives it, they calculate the parity themselves. If an error has occurred and a bit has been flipped (even the parity bit itself), the other person will know and request that the data be sent again. Of course, if there are two errors in a single transmission, this system breaks, but it is not that common.
Instead of storing copies of the data (which would be like sending a message twice), RAID 5 just stores a parity bit. You can imagine it as RAID 0 with redundancy – it requires at least three disks. All but one disk is used as a regular RAID 0 array, but the last disk is used for parity. If one of the disks goes, you can reverse the parity calculation to restore all data on one of the disks (although this is a lengthy and intensive operation).
In practice, RAID 5 does not use a special disk for parity because it is faster to strip the parity bits across all disks, but you can think of it this way when calculating how much space a RAID 5 array will give you. In fact, add all of your disks except one, and that's how much space you have. RAID 5 becomes space-saving with more disks: three disks are 66% efficiency, but 10 disks is 90% efficiency. This significantly reduces costs compared to RAID 1.
RAID 5 is not without drawback, however. Since parity must be calculated when writing to the disc, the writing performance decreases. The problem becomes even bigger when you consider that to reverse a single bit in a disk, all disks must be read to recalculate the parity for that block. In practice, if RAID 0 provides performance scaling with
n disks, RAID 5 provides
n - 1 performance for write operations. With an array large enough, the problem isn't too bad.
No matter how many disks you have, you can only survive one disk failure. This doesn't seem like a big deal as failures aren't very common and you're unlikely to experience two at the same time, but rebuilding arrays can be very intensive on your drives – you're basically reading every bit of data from each, at the time that they are the most vulnerable. So if one of these goes, it is more likely that another drive will also fail.
RAID 5 should be your favorite option if you have three disks, since RAID 1 would be a waste of space. If you have 4 drives this is probably still the best option, but the other two options on this list are available to you too.
RAID 6 is like RAID 5, except that the "parity disk" is mirrored. This allows your array to survive two drive failures. However, the writing performance is worse at
n - 2 and you will of course have less space.
Not much can be said about it. If you have a large number of disks (6, 8 or more), consider RAID 6 because of the extra redundancy. Only RAID 6 meets the first part of the 3-2-1 backup strategy: keep at least three copies of your data, with two backups on different media, at least one of which is in a different location. RAID 6 can survive two disk failures, making it functionally the same as RAID 1 with three disks (minus rebuild times).
In practice, RAID 6 will almost never experience a total array error, especially if you add more parity disks to the equation. This, combined with backups and copies in other data centers, is the way archival services like AWS Glacier and Backblaze achieve 99.999999999% durability.
RAID 10 (1 + 0)
RAID 10 is technically a form of Nested RAID, which is also a complex whole. Basically, if you have four drives and don't want to use RAID 5 or 6, your only other options are RAID 0 and 1, both of which have their problems. Instead, split those drives in two, create two RAID 1 arrays and take those arrays and use them to create one large RAID 0 array. RAID 10 requires a minimum of four disks and also requires an even number of disks in total.
This gives you all the benefits of RAID 1 and RAID 0 without many drawbacks – fast readable speeds, fast write speeds, high redundancy and easy rebuild, while still being able to save half the total space of all your drives use. RAID 10 is actually more resilient than RAID 1. In the above diagram, disk 1 and disk 3 can fail and the array can still be completely rebuilt (although if both disk 0 and disk 1 fail, that array is irreparable)
RAID 10 is a common RAID level for servers. It is very fast and can survive at least one disk failure. The only real issue is the price as you still pay double to keep copies of all your data, but for general workloads RAID 10 beats almost any other RAID configuration for speed and only loses for RAID 0 for throughput.  RAID 50/60
RAID 50/60 is basically two RAID 5 or 6 arrays in RAID 0. This improves performance, just like RAID 10, the main improvement being the writing performance, since reading the other disks when calculating the parity is faster.
It requires at least six disks (eight in the case of RAID 60), and since there are separate RAID 5 arrays, you need additional parity drives, making it less space-saving, but slightly more resilient. Overall, RAID 50 is in fact a more performant version of RAID 5.