Analysis of a new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors

Today's data storage systems are increasingly adopting low-cost disk drives that have higher capacity but lower reliability, leading to more frequent rebuilds and to a higher risk of unrecoverable media errors. We propose a new XOR-based intra-disk redundancy scheme, called interleaved parity check (IPC), to enhance the reliability of RAID systems that incurs only negligible I/O performance degradation. The proposed scheme introduces an additional level of redundancy inside each disk, on top of the RAID redundancy across multiple disks. The RAID parity provides protection against disk failures, while the proposed scheme aims to protect against media-related unrecoverable errors.We develop a new model capturing the effect of correlated unrecoverable sector errors and subsequently use it to analyze the proposed scheme as well as the traditional redundancy schemes based on Reed-Solomon (RS) codes and single-parity-check (SPC) codes. We derive closed-form expressions for the mean time to data loss (MTTDL) of RAID 5 and RAID 6 systems in the presence of unrecoverable errors and disk failures. We then combine these results for a comprehensive characterization of the reliability of RAID systems that incorporate the proposed IPC redundancy scheme. Our results show that in the practical case of correlated errors, the proposed scheme provides the same reliability as the optimum albeit more complex RS coding scheme. Finally, the throughput performance of incorporating the intra-disk redundancy on various RAID systems is evaluated by means of event-driven simulations. A detailed description of these contributions is given in [1].

[1]  Kishor S. Trivedi,et al.  Reliability Analysis of Redundant Arrays of Inexpensive Disks , 1993, J. Parallel Distributed Comput..

[2]  Peter F. Corbett,et al.  Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction , 2004 .

[3]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[4]  Donald F. Towsley,et al.  A Performance Evaluation of RAID Architectures , 1996, IEEE Trans. Computers.

[5]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[6]  Joseph F. Murray,et al.  Reliability and security of RAID storage systems and D2D archives using SATA disk drives , 2005, TOS.

[7]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[8]  Kishor S. Trivedi,et al.  Data Integrity Analysis of Disk Array Systems with Analytic Modeling of Coverage , 1995, Perform. Evaluation.

[9]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[10]  Jie Li,et al.  Reliability analysis of disk array organizations by considering uncorrectable bit errors , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[11]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[12]  Randy H. Katz,et al.  How reliable is a RAID? , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[13]  Arif Merchant,et al.  Issues and challenges in the performance analysis of real disk arrays , 2004, IEEE Transactions on Parallel and Distributed Systems.

[14]  Dirk Beyer,et al.  Designing for Disasters , 2004, FAST.

[15]  Walter A. Burkhard,et al.  Disk array storage system reliability , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[16]  Huaxia Xia,et al.  RobuSTore: a distributed storage architecture with robust and high performance , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[17]  Leonard Kleinrock,et al.  Queueing Systems: Volume I-Theory , 1975 .

[18]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.