Constructing double- and triple-erasure-correcting codes with high availability using mirroring and parity approaches

With the rapid progress of the capacity and slow pace of the speed/MTTF of hard disks, and increasing size of storage systems, the reliability and availability of storage systems become more and more serious. This paper discusses the method of constructing double- and triple-erasure-correcting codes via combining mirroring and parity approaches in details, and presents a double-erasure code MPDC and a triple-erasure code MPPDC based on one-factorizations of complete graphs. The two codes are simple, easy to implement, and have no disk number limitation. They achieve perfect fault-free load balance and approximately optimal reconstruction load balance. The simulation results show that, compared with other double- and triple-erasure codes, MPDC and MPPDC have comparative light-load and moderate-load performance and better heavy-load performance in fault-free mode. Because parity declustering is used, the two codes are far superior to the other double- and triple-erasure codes in degraded- and reconstruction-mode performance.

[1]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[2]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[3]  Philip S. Yu,et al.  Combining replication and parity approaches for fault-tolerant disk arrays , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[4]  Liu Jing Construct Double-Erasure-Correcting Data Layout Using P1F , 2006 .

[5]  James S. Plank,et al.  A practical analysis of low-density parity-check erasure codes for wide-area storage applications , 2004, International Conference on Dependable Systems and Networks, 2004.

[6]  Daniel A. Spielman,et al.  Practical loss-resilient codes , 1997, STOC '97.

[7]  Eric J. Schwabe,et al.  Evaluating Approximately Balanced Parity-Declustered Data Layouts for Disk Arrays , 1997, Parallel Comput..

[8]  David J. DeWitt,et al.  A performance study of three high availability data replication strategies , 2005, Distributed and Parallel Databases.

[9]  Gang Wang,et al.  Combinatorial Constructions of Multi-erasure-Correcting Codes with Independent Parity Symbols for Storage Systems , 2007 .

[10]  Walter A. Burkhard,et al.  Permutation development data layout (PDDL) , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[11]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[12]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[13]  F. Cristian,et al.  Declustered disk array architectures with optimal and near-optimal parallelism , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[14]  Daniel P. Siewiorek,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.

[15]  Flaviu Cristian,et al.  Tolerating Multiple Failures in RAID Architectures with Optimal Storage and Uniform Declustering , 1997, ISCA.

[16]  Ahmad Patooghy,et al.  A Low-Power and SEU-Tolerant Switch Architecture for Network on Chips , 2007 .

[17]  C. Colbourn,et al.  Handbook of Combinatorial Designs , 2006 .

[18]  Walter A. Burkhard,et al.  Permutation Development Data Layout (PDDL) Disk Array Declustering , 2002 .

[19]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[20]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[21]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[22]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[23]  Randy H. Katz,et al.  Coding techniques for handling failures in large disk arrays , 2005, Algorithmica.

[24]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[25]  J. Plank Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications , 2005 .