Optimal recovery of single disk failure in RDP code storage systems

Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For example, conventional RAID-5 systems provide data availability for single disk failure only, while recent advanced coding techniques such as row-diagonal parity (RDP) can provide data availability with up to two disk failures. To reduce the probability of data unavailability, whenever a single disk fails, disk recovery (or rebuild) will be carried out. We show that conventional recovery scheme of RDP code for a single disk failure is inefficient and suboptimal. In this paper, we propose an optimal and efficient disk recovery scheme, Row-Diagonal Optimal Recovery (RDOR), for single disk failure of RDP code that has the following properties: (1) it is read optimal in the sense that it issues the smallest number of disk reads to recover the failed disk; (2) it has the load balancing property that all surviving disks will be subjected to the same amount of additional workload in rebuilding the failed disk. We carefully explore the design state space and theoretically show the optimality of RDOR. We carry out performance evaluation to quantify the merits of RDOR on some widely used disks.

[1]  John C. S. Lui,et al.  Performance Analysis of Disk Arrays under Failure , 1990, VLDB.

[2]  Jai Menon,et al.  Comparison of sparing alternatives for disk arrays , 1992, ISCA '92.

[3]  Richard M. Wilson,et al.  A course in combinatorics , 1992 .

[4]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[5]  Daniel P. Siewiorek,et al.  Fast, on-line failure recovery in redundant disk arrays , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[6]  Siu Lun Ma,et al.  A survey of partial difference sets , 1994, Des. Codes Cryptogr..

[7]  Mark Holland,et al.  On-Line Data Reconstruction in Redundant Disk Arrays (CMU-CS-94-164) , 1994 .

[8]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[9]  Philip S. Yu,et al.  Analytic Modeling of Clustered RAID with Mapping Based on Nearly Random Permutation , 1996, IEEE Trans. Computers.

[10]  Alexander Thomasian,et al.  RAID5 Performance with Distributed Sparing , 1997, IEEE Trans. Parallel Distributed Syst..

[11]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[12]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[13]  John C. S. Lui,et al.  Automatic Recovery from Disk Failure in Continuous-Media Servers , 2002, IEEE Trans. Parallel Distributed Syst..

[14]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[15]  GhemawatSanjay,et al.  The Google file system , 2003 .

[16]  Peter F. Corbett,et al.  Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction , 2004 .

[17]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[18]  Ethan L. Miller,et al.  Evaluation of distributed recovery in large-scale storage systems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[19]  Daniel P. Siewiorek,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.

[20]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[21]  James Lee Hafner,et al.  Matrix methods for lost data reconstruction in erasure codes , 2005, FAST'05.

[22]  James Lee Hafner,et al.  HoVer Erasure Codes For Disk Arrays , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[23]  C. Lueth RAID-DPTM: NETWORK APPLIANCETM IMPLEMENTATION OF RAID DOUBLE PARITY FOR DATA PROTECTION , 2006 .

[24]  Mary Baker,et al.  A fresh look at the reliability of long-term digital storage , 2005, EuroSys.

[25]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[26]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[27]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[28]  Nikolai Joukov,et al.  RAIF: Redundant Array of Independent Filesystems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[29]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[30]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[31]  James S. Plank The RAID-6 Liberation Codes , 2008, FAST.

[32]  Hong Jiang,et al.  WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance , 2009, FAST.