Shortcut method for reliability comparisons in RAID

Abstract Given that the reliability of each disk in a disk array during its useful lifetime is given as r  = 1 −  ϵ with ϵ  ≪ 1, we show that the reliability of a RAID disk array tolerating all possible n  − 1 disk failures can be specified as R  ≈ 1 −  a n ϵ n , where a n is the smallest nonzero coefficient in the corresponding asymptotic expansion, e.g., for n -way replication R  = 1 −  ϵ n . We compare the reliability of several mirrored disk organizations, which provide tradeoffs between reliability and load balancedness (after disk failure) by comparing their a 2 values, which can be obtained via a partial reliability analysis taking into account a few disk failures. We next use asymptotic expansions to compare the reliability of hierarchical RAID disk arrays, which combine replication and rotated parity disk arrays (RAID5 and RAID6). Finally, we argue that the mean time to data loss in systems with repair is related to the reliability without repair. As part of this discussion we show how to estimate the mean time to data loss in RAID5 and RAID6 disk arrays without resorting to transient analysis.

[1]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[2]  Thomas Schwarz,et al.  Reliability and performance of disk arrays , 1994 .

[3]  Peter F. Corbett,et al.  Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction , 2004 .

[4]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis of Computer Systems , 1996, Springer US.

[5]  Kishor S. Trivedi,et al.  Reliability Analysis of Redundant Arrays of Inexpensive Disks , 1993, J. Parallel Distributed Comput..

[6]  Gang Fu,et al.  Rebuild Strategies for Redundant Disk Arrays , 2004, MSST.

[7]  David J. DeWitt,et al.  A performance study of three high availability data replication strategies , 2005, Distributed and Parallel Databases.

[8]  Garth A. Gibson Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis , 1990 .

[9]  Prashant J. Shenoy,et al.  Rules of thumb in data engineering , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[10]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[11]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[12]  Ethan L. Miller,et al.  Disk infant mortality in large storage systems , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[13]  Alexander Thomasian,et al.  RAID5 Performance with Distributed Sparing , 1997, IEEE Trans. Parallel Distributed Syst..

[14]  Hannu H. Kari Latent Sector Faults and Reliability of Disk Arrays , 2005 .

[15]  Sung Hoon Baek,et al.  Reliability and performance of hierarchical RAID with multiple controllers , 2001, PODC '01.

[16]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[17]  Jehoshua Bruck,et al.  EVENODD: an optimal scheme for tolerating double disk failures in RAID architectures , 1994, ISCA '94.

[18]  Donald F. Towsley,et al.  A Performance Evaluation of RAID Architectures , 1996, IEEE Trans. Computers.