আব্দুল হাকিম ডালিম (তথ্য প্রতিবেদক): RAID Redundant Array of Independent Disks

Redundant Array of Independent Disks, History of RAID :

RAID (redundant array of independent disks, originally redundant array of inexpensive disks is a storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called "RAID levels", depending on the level of redundancy and performance required.

The term "RAID" was first defined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987.^[3] Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks as a means of disassociating a low-cost expectation from RAID technology.^[4]^[5]

Data striping:

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, in a way that accesses of sequential segments are made to different physical storage devices. Striping is useful when a processing device requests access to data more quickly than a storage device can provide access. By performing segment accesses on multiple devices, multiple segments can be accessed concurrently. This provides more data access throughput, which avoids causing the processor to idly wait for data accesses. Striping is used across disk drives in RAID storage, network interfaces in Grid-oriented Storage, and RAM in some systems.

RAID parity

Further information: Parity bit

Many RAID levels employ an error protection scheme called "parity", a widely used method in information technology to provide fault tolerance in a given set of data. Most use the simple XOR parity described in this section, but RAID 6 uses two separate parities based respectively on addition and multiplication in a particular Galois Field^[12] or Reed-Solomon error correction.

The XOR operator is central to how parity data are created and used within an array. It is used both for the protection of data, as well as for the recovery of missing data.

As an example, consider a simple RAID made up of 6 drives (4 for data, 1 for parity, and 1 for use as a hot spare), where each drive has only a single byte worth of storage (a '-' represents a bit, the value of which doesn't matter at this point in the discussion):

Drive #1: -------- (Data)

Drive #2: -------- (Data)

Drive #3: -------- (Data)

Drive #4: -------- (Data)

Drive #5: -------- (Hot Spare)

Drive #6: -------- (Parity)

Suppose the following data are written to the drives:

Drive #1: 00101010 (Data)

Drive #2: 10001110 (Data)

Drive #3: 11110111 (Data)

Drive #4: 10110101 (Data)

Drive #5: -------- (Hot Spare)

Drive #6: -------- (Parity)

Every time data are written to the data drives, a parity value must be calculated in order for the array to be able to recover in the event of a failure. To calculate the parity for this RAID, a bitwise XOR of each drive's data is calculated as follows, the result of which is the parity data:

00101010
XOR 10001110 XOR 11110111 XOR 10110101 = 11100110

The parity data 11100110 are then written to the dedicated parity drive:

Drive #1: 00101010 (Data)

Drive #2: 10001110 (Data)

Drive #3: 11110111 (Data)

Drive #4: 10110101 (Data)

Drive #5: -------- (Hot Spare)

Drive #6: 11100110 (Parity)

Suppose Drive #3 fails. In order to restore the contents of Drive #3, the same XOR calculation is performed against the data of all the remaining data drives and data on the parity drive (11100110) which was stored in Drive #6:

00101010
XOR 10001110 XOR 11100110 XOR 10110101 = 11110111

The XOR operation will yield the missing data. With the complete contents of Drive #3 recovered, the data are written to the hot spare, which then acts as a member of the array and allows the group as a whole to continue operating.

Drive #1: 00101010 (Data)

Drive #2: 10001110 (Data)

Drive #3: --Dead-- (Data)

Drive #4: 10110101 (Data)

Drive #5: 11110111 (Hot Spare)

Drive #6: 11100110 (Parity)

At this point the failed drive has to be replaced with a working one of the same size. Depending on the implementation, the new drive becomes a new hot spare, and the old hot spare drive continues to act as a data drive of the array, or (as illustrated below) the original hot spare's contents are automatically copied to the new drive by the array controller, allowing the original hot spare to return to its original purpose. The resulting array is identical to its pre-failure state:

Drive #1: 00101010 (Data)

Drive #2: 10001110 (Data)

Drive #3: 11110111 (Data)

Drive #4: 10110101 (Data)

Drive #5: -------- (Hot Spare)

Drive #6: 11100110 (Parity)

This same basic XOR principle applies to parity within RAID groups regardless of capacity or number of drives. As long as there are enough drives present to allow for an XOR calculation to take place, parity can be used to recover data from any single drive failure. (A minimum of three drives must be present in order for parity to be used for fault tolerance, because the XOR operator requires two operands, and a place to store the result).

Mirroring versus parity RAID levels in relational databases

A common opinion (and one which serves to illustrate the dynamics of proper RAID deployment) is that RAID 10 (a non-parity, mirrored RAID) is inherently better for relational databases than RAID 5, because RAID 5 requires the recalculation and redistribution of parity data on a per-write basis.^[13]

There are, however, other considerations which must be taken into account other than simply those regarding performance. RAID 5 and other non-mirror-based arrays offer a lower degree of resiliency than RAID 10 by virtue of RAID 10's mirroring strategy. In a RAID 10, I/O can continue even in spite of multiple drive failures. By comparison, in a RAID 5 array, any failure involving more than one drive renders the array itself unusable by virtue of parity recalculation being impossible to perform. Thus, RAID 10 is frequently favored because it provides the lowest level of risk.^[14]

Modern SAN design largely masks any performance hit while a RAID is in a degraded state, by virtue of being able to perform rebuild operations both in-band or out-of-band with respect to existing I/O traffic. Given the rare nature of drive failures in general, and the exceedingly low probability of multiple concurrent drive failures occurring within the same RAID, the choice of RAID 5 over RAID 10 often comes down to the preference of the storage administrator, particularly when weighed against other factors such as cost, throughput requirements, and physical spindle availability.

Basic concepts used by RAID systems

RAID uses a few basic ideas, which were described in the article "RAID: High-Performance, Reliable Secondary Storage" by Peter Chen and others, published in 1994.^[2]

Caching

Caching is a technology that also has its uses in RAID systems. There are different kinds of caches that are used in RAID systems:

Operating system
RAID controller
Enterprise disk array

In modern systems, a write request is shown as done when the data has been written to the cache. This does not mean that the data has been written to the disk. Requests from the cache are not necessarily handled in the same order than they were written to the cache. This makes it possible that, if the system fails, sometimes some data has not been written to the disk involved. For this reason, many systems have a cache that is backed by a battery.

Mirroring: More than one copy of the data

When talking about a mirror, this is a very simple idea. Instead of the data being in only one place, there are several copies of the data. These copies usually are on different hard disks (or disk partitions). If there are two copies, one of them can fail without the data being affected (as it still is on the other copy). Mirroring can also give a boost when reading data. It will always be taken from the fastest disk that responds. Writing data is slower though, because all disks need to be updated.

Striping: Part of the data is on another disk

With striping, the data is split into different parts. These parts then end up on different disks (or disk partitions). This means that writing data is faster, as it can be done in parallel. This does not mean that there will not be faults, as each block of data is only found on one disk.

Error correction and faults

It is possible to calculate different kinds of checksums. Some methods of calculating checksums allow finding a mistake. Most RAID levels that use redundancy can do this. Some methods are more difficult to do, but they allow to not only detect the error, but to fix it.

Hot spares: using more disks than needed

Many of the ways to have RAID support something is called a hot spare. A hot spare is an empty disk that is not used in normal operation. When a disk fails, data can directly be copied onto the hot spare disk. That way, the failed disk needs to be replaced by a new empty drive to become the hot spare.

Stripe size and chunk size: spreading the data over several disks

RAID works by spreading the data over several disks. Two of the terms often used in this context are stripe size and chunk size.

The chunk size is the smallest data block that is written to a single disk of the array. The stripe size is the size of a block of data that will be spread over all disks. That way, with four disks, and a stripe size of 64 kilobytes (kB), 16 kB will be written to each disk. The chunk size in this example is therefore 16 kB. Making the stripe size bigger will mean a faster data transfer rate, but also a bigger maximum latency. In this case, this is the time needed to get a block of data.

Putting disk together: JBOD, concatenation or spanning

http://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/JBOD.svg/200px-JBOD.svg.png

JBOD with 3 disks of different sizes

Many controllers (and also software) can put disks together in the following way: Take the first disk, till it ends, then they take the second, and so on. In that way, several smaller disks look like a larger one. This is not really RAID, as there is no redundancy. Also, spanning can combine disks where RAID 0 cannot do anything. Generally, this is called just a bunch of disks (JBOD).
This is like a distant relative of RAID because the logical drive is made of different physical drives. Concatenation is sometimes used to turn several small drives into one larger useful drive. This can not be done with RAID 0. For example, JBOD could combine 3 GB, 15 GB, 5.5 GB, and 12 GB drives into a logical drive at 35.5 GB, which is often more useful than the drives alone.
In the diagram to the right, data are concatenated from the end of disk 0 (block A63) to the beginning of disk 1 (block A64); end of disk 1 (block A91) to the beginning of disk 2 (block A92). If RAID 0 were used, then disk 0 and disk 2 would be truncated to 28 blocks, the size of the smallest disk in the array (disk 1) for a total size of 84 blocks.
Some RAID controllers use JBOD to talk about working on drives without RAID features. Each drive shows up separately in the operating system. This JBOD is not the same as concatenation.
Many Linux systems use the terms "linear mode" or "append mode". The Mac OS X 10.4 implementation — called a "Concatenated Disk Set" — does not leave the user with any usable data on the remaining drives if one drive fails in a concatenated disk set, although the disks otherwise operate as described above.
Concatenation is one of the uses of the Logical Volume Manager in Linux. It can be used to create virtual drives.

Drive Clone

Most modern hard disks have a standard called Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T). SMART allows to monitor certain things on a hard disk drive. Certain controllers allow to replace a single hard disk even before it fails, for example because S.M.A.R.T or another disk test reports too many correctable errors. To do this, the controller will copy all the data onto a hot spare drive. After this, the disk can be replaced by another (which will simply be a hot spare)

Standard levels

RAID 0 (block-level striping without parity or mirroring) has no (or zero) redundancy. It provides improved performance and additional storage but no fault tolerance. Hence simple stripe sets are normally referred to as RAID 0. Any drive failure destroys the array, and the likelihood of failure increases with more drives in the array (at a minimum, potential for catastrophic data loss is double that of isolated drives without RAID). A single drive failure destroys the entire array because when data are written to a RAID 0 volume, the data are broken into fragments called blocks. The number of blocks is dictated by the stripe size, which is a configuration parameter of the array. The blocks are written to their respective drives simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off each drive in parallel, increasing bandwidth. RAID 0 does not implement error checking, so any read error is uncorrectable. More drives in the array means higher bandwidth, but greater risk of data loss.
In RAID 1 (mirroring without parity or striping), data are written identically to two drives, thereby producing a "mirrored set"; the read request is serviced by either of the two drives containing the requested data, whichever one involves least seek time plus rotational latency. Similarly, a write request updates the stripes of both drives. The write performance depends on the slower of the two writes (i.e., the one that involves larger seek time and rotational latency); at least two drives are required to constitute such an array. While more constituent drives may be employed, many implementations deal with a maximum of only two. The array continues to operate as long as at least one drive is functioning. With appropriate operating system support, there can be increased read performance, and only a minimal write performance reduction; implementing RAID 1 with a separate controller for each drive in order to perform simultaneous reads (and writes) is sometimes called "multiplexing" (or "duplexing" when there are only two drives).
In RAID 10 (mirroring and striping), data are written in stripes across primary disks that have been mirrored to the secondary disks. A typical RAID 10 configuration consists of four drives, two for striping and two for mirroring. A RAID 10 configuration takes the best concepts of RAID 0 and RAID 1, and combines them to provide better performance along with the reliability of parity without actually having parity as with RAID 5 and RAID 6. RAID 10 is often referred to as RAID 1+0 (mirrored+striped) (see also Nested (hybrid) RAID below).
In RAID 2 (bit-level striping with dedicated Hamming-code parity), all disk spindle rotation is synchronized, and data are striped such that each sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive. This theoretical RAID level is not used in practice.
In RAID 3 (byte-level striping with dedicated parity), all disk spindle rotation is synchronized, and data are striped so each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive. Although implementations exist,^[9] RAID 3 is not commonly used in practice.
RAID 4 (block-level striping with dedicated parity) is equivalent to RAID 5 (see below) except that all parity data are stored on a single drive. In this arrangement files may be distributed between multiple drives. Each drive operates independently, allowing I/O requests to be performed in parallel. However, the use of a dedicated parity drive could create a performance bottleneck; because the parity data must be written to a single, dedicated parity drive for each block of non-parity data, the overall write performance may depend a great deal on the performance of this parity drive.
RAID 5 (block-level striping with distributed parity) distributes parity along with the data and requires all drives but one to be present to operate; the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. However, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt, because each block of the failed disk needs to be reconstructed by reading all other disks i.e. the parity and other data blocks of a RAID stripe. Additionally, there is the potentially disastrous RAID 5 write hole. RAID 5 requires at least three disks.
RAID 6 (block-level striping with double distributed parity) provides fault tolerance of two drive failures; the array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems. This becomes increasingly important as large-capacity drives lengthen the time needed to recover from the failure of a single drive. Single-parity RAID levels are as vulnerable to data loss as a RAID 0 array until the failed drive is replaced and its data rebuilt; the larger the drive, the longer the rebuild takes. Double parity gives additional time to rebuild the array without the data being at risk if a single additional drive fails before the rebuild is complete. Like RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt.

Nested (hybrid) RAID

In what was originally termed hybrid RAID,^[10] many storage controllers allow RAID levels to be nested. The elements of a RAID may be either individual drives or RAIDs themselves. However, if a RAID is itself an element of a larger RAID, it is unusual for its elements to be themselves RAIDs.

As there is no basic RAID level numbered larger than 9, nested RAIDs are usually clearly described by attaching the numbers indicating the RAID levels, sometimes with a "+" in between. The order of the digits in a nested RAID designation is the order in which the nested array is built: For a RAID 1+0, drives are first combined into multiple level 1 RAIDs that are themselves treated as single drives to be combined into a single RAID 0; the reverse structure is also possible (RAID 0+1).

The final RAID is known as the top array. When the top array is a RAID 0 (such as in RAID 1+0 and RAID 5+0), most vendors omit the "+" (yielding RAID 10 and RAID 50, respectively).

RAID 0+1: striped sets in a mirrored set (minimum four drives; even number of drives) provides fault tolerance and improved performance but increases complexity.

The key difference from RAID 1+0 is that RAID 0+1 creates a second striped set to mirror a primary striped set. The array continues to operate with one or more drives failed in the same mirror set, but if drives fail on both sides of the mirror the data on the RAID system is lost.

RAID 1+0: (a.k.a. RAID 10) mirrored sets in a striped set (minimum four drives; even number of drives) provides fault tolerance and improved performance but increases complexity.

The key difference from RAID 0+1 is that RAID 1+0 creates a striped set from a series of mirrored drives. The array can sustain multiple drive losses so long as no mirror loses all its drives.^[11]

RAID 5+3: mirrored striped set with distributed parity (some manufacturers label this as RAID 53).

A RAID controller might support upgrading a RAID 1 array to a RAID 1+0 array on the fly, but require a lengthy off-line rebuild to upgrade from RAID 1 to RAID 0+1.^[^{citation
needed}^]

আব্দুল হাকিম ডালিম (তথ্য প্রতিবেদক)

Translate

সোমবার, ২৮ অক্টোবর, ২০১৩

RAID Redundant Array of Independent Disks

RAID parity

Mirroring versus parity RAID levels in relational databases

Putting disk together: JBOD, concatenation or spanning

Drive Clone

কোন মন্তব্য নেই:

একটি মন্তব্য পোস্ট করুন

Popular Posts

show

click here

গত মাসের পৃষ্ঠাদর্শন

ব্লগ সংরক্ষাণাগার