RAID reliability14. Jul '14

Assumptions

Reliability of all disks is constant:

\begin{equation*} P(D0) = P(D1) = P(D2) = P(D3) = 90 \% \end{equation*}

This does not take into account wearout and other factors.

Probability calculation

Event A and B:

\begin{equation*} P(A \cap B) = P(A) \times P(B) \end{equation*}

Event A or B:

\begin{equation*} P(A \cup B) = P(A) + P(B) - P(A \cap B) \end{equation*}

Event A, B or C 1:

\begin{equation*} P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(B \cap C) - P(B \cap C) + P(A \cap B \cap C) \end{equation*}

Event A, B, C or D:

\begin{align*} P(A \cup B \cup C \cup D) = P(A) + P(B) + P(C) + P(D) \\ - P(A \cap B) - P(A \cap C) - P(A \cap D) - P(B \cap C) - P(B \cap D) - P(C \cap D) \\ + P(A \cap B \cap C) + P(B \cap C \cap D) + P(A \cap B \cap D) + P(C \cap B \cap D) \\ - P(A \cap B \cap C \cap D) \end{align*}

Of course it might be more reasonable to calculate them step by step:

\begin{equation*} P(A \cup B \cup C) = P(A \cup B) + P(C) - P(C) \times P(A \cup B) \end{equation*}
\begin{equation*} P(A \cup B \cup C \cup D) = P(A \cup B \cup C) + P(D) - P(D) \times P(A \cup B \cup C) \end{equation*}
1

http://statistics.about.com/od/Formulas/a/Probability-Of-The-Union-Of-Three-Or-More-Sets.htm

RAID1

RAID1 aka mirror writes same blocks to two harddisks. Minimum setup requires two harddisks, you pay for 8TB but you get to use 4TB:

A4A3A2A1A4A3A2A1RAID 1Disk 0Disk 1

Diagram of a RAID1 setup

When both disks are operational:

\begin{equation*} P(D0 \cup D1) = P(D0) + P(D1) - P(D0 \cap D1) = 0.9 + 0.9 - 0.81 = 0.99 = 99 \% \end{equation*}

Once one disk fails the reliability of the array drops to relability of the remaining disk (90%).

RAID0

RAID0 aka striping doubles throughput on the expense of reliability. Minimum setup requires 2 disks, you get to use all the space that is available:

A7A5A3A1A8A6A4A2RAID 0Disk 0Disk 1

Diagram of a RAID0 setup

When both disks are operational:

\begin{equation*} P(D0 \cap D1) = P(D0 \cup D1) = 0.9 \times 0.9 = 0.81 = 81 \% \end{equation*}

If one of the disks fails the data on the remaining disk is useless.

RAID10

RAID10 also known as stripe of mirrors, minimum amount of drives is 4. Which means you pay for 16TB but you can use only 8TB.

A7A5A3A1A8A6A4A2RAID 1+0Disk 0Disk 1A7A5A3A1A8A6A4A2Disk 2Disk 3RAID 1RAID 1RAID 0

Diagram of RAID10 setup

Reliability of the array when all disks are operational:

\begin{equation*} P = P(D0 \cap D1 \cup D2 \cap D3) = (P(D0) + P(D1) - P(D0) \times P(D1)) \times (P(D2) + P(D3) - P(D2) \times P(D3)) \end{equation*}
\begin{equation*} P = (0.9 + 0.9 - 0.9 \times 0.9) \times (0.9 + 0.9 - 0.9 \times 0.9) = 0.99 \times 0.99 = 0.9801 \approx 98 \% \end{equation*}

The reliability of the array when one disk goes down, just substitute first disk reliability P(D0) with zero:

\begin{equation*} P_{degraded} = (0 + 0.9 - 0.9 \times 0) \times (0.9 + 0.9 - 0.9 \times 0.9) = 0.9 \times (0.9 + 0.9 - 0.9 \times 0.9) \end{equation*}
\begin{equation*} P_{degraded} = 0.9 \times 0.99 = 0.891 \approx 89.1 \% \end{equation*}

Now obviously if another disk from the same mirror goes down the whole array is useless. That is also reflected by substituting P(D1) with zero. However failing disk in the other mirror (eg. D2) brings you down to reliability of RAID0 of 81%.

RAID01

RAID01 is mirror of stripes:

A7A5A3A1A8A6A4A2RAID 0+1Disk 0Disk 1A7A5A3A1A8A6A4A2Disk 2Disk 3RAID 0RAID 0RAID 1

Diagram of RAID01 setup

The reliability is slightly lower:

\begin{equation*} P = P(D0 \cup D1 \cap D2 \cup D3) = P(D0) \times P(D1) + P(D2) \times P(D3) - P(D0) \times P(D1) \times P(D2) \times P(D3) \end{equation*}
\begin{equation*} P = (0.9 \times 0.9 + 0.9 \times 0.9 - 0.9 \times 0.9 \times 0.9 \times 0.9) = 0.81 + 0.81 - 0.6561 = 0.9639 \approx 96.4 \% \end{equation*}

But the difference is obvious in degraded mode:

\begin{equation*} P_{degraded} = (0 \times 0.9 + 0.9 \times 0.9 - 0 \times 0.9 \times 0.9 \times 0.9) = 0.9 \times 0.9 = 0.81 = 81 \% \end{equation*}

Again if another disk dies in the other stripe the array is dead. If another disk dies in the same stripe then reliability remains the same of 81%.

Conclusion

This should pretty well explain the differences between RAID10 and RAID01. In one case the reliability is slightly lower.

statistics math probability RAID TU Berlin