Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

Large scale storage systems require multi-disk fault tolerant erasure codes. Replication and RAID extensions that protect against two- and three-disk failures offer a stark tradeoff between how much data must be stored, and how much data must be read to recover a failed disk. Flat XOR-codes-erasure codes in which parity disks are calculated as the XOR of some subset of data disks-offer a tradeoff between these extremes. In this paper, we describe constructions of two novel flat XOR-code, Stepped Combination and HD-Combination codes. We describe an algorithm for flat XOR-codes that enumerates recovery equations, i.e., sets of disks that can recover a failed disk. We also describe two algorithms for flat XOR-codes that generate recovery schedules, i.e., sets of recovery equations that can be used in concert to achieve efficient recovery. Finally, we analyze the key storage properties of many flat XOR-codes and of MDS codes such as replication and RAID 6 to show the cost-benefit tradeoff gap that flat XOR-codes can fill.

[1]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[2]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[3]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[4]  Lihao Xu,et al.  An efficient XOR-scheduling algorithm for erasure codes encoding , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[5]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[6]  Darrell D. E. Long,et al.  Self-Adaptive Two-Dimensional RAID Arrays , 2007, 2007 IEEE International Performance, Computing, and Communications Conference.

[7]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[8]  Hong Jiang,et al.  P-Code: a new RAID-6 code with optimal properties , 2009, ICS '09.

[9]  J.-F. Paris,et al.  Outshining Mirrors: MTTDL of Fixed-Order Spiral Layouts , 2007, Fourth International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI 2007).

[10]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[11]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[12]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[13]  James Lee Hafner,et al.  HoVer Erasure Codes For Disk Arrays , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[14]  Michael G. Pecht,et al.  Enhanced Reliability Modeling of RAID Storage Systems , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[15]  James Lee Hafner,et al.  Matrix methods for lost data reconstruction in erasure codes , 2005, FAST'05.

[16]  James Lee Hafner,et al.  Reliability for Networked Storage Nodes , 2011, IEEE Transactions on Dependable and Secure Computing.

[17]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[18]  James S. Plank,et al.  A practical analysis of low-density parity-check erasure codes for wide-area storage applications , 2004, International Conference on Dependable Systems and Networks, 2004.

[19]  James S. Plank,et al.  The Raid-6 Liber8Tion Code , 2009, Int. J. High Perform. Comput. Appl..

[20]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[21]  Jin Qian,et al.  PARAID: A gear-shifting power-aware RAID , 2007, TOS.

[22]  Tapas Kanungo,et al.  IBM Research Report Performance Metrics for Erasure Codes in Storage Systems , 2004 .

[23]  Jon G. Elerath,et al.  Hard-disk drives: the good, the bad, and the ugly , 2009, CACM.

[24]  Henry M. Tufo,et al.  Tornado Codes for MAID Archival Storage , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[25]  James S. Plank,et al.  Small parity-check erasure codes - exploration and observations , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[26]  J. Menon,et al.  Distributed sparing in disk arrays , 1992, Digest of Papers COMPCON Spring 1992.

[27]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[28]  Dong Li,et al.  EERAID: energy efficient redundant and inexpensive disk array , 2004, EW 11.

[29]  Chan-Ik Park Efficient Placement of Parity and Data to Tolerate Two Disk Failures in Disk Array Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[30]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[31]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[32]  Ahmed Amer,et al.  Progressive Parity-Based Hardening of Data Stores , 2008, 2008 IEEE International Performance, Computing and Communications Conference.

[33]  Alexander Vardy,et al.  On the stopping distance and the stopping redundancy of codes , 2006, IEEE Transactions on Information Theory.

[34]  P. Varman,et al.  Conserving Energy in Conventional Disk based RAID Systems , 2005 .

[35]  GhemawatSanjay,et al.  The Google file system , 2003 .

[36]  Ethan L. Miller,et al.  Reliability of flat XOR-based erasure codes on heterogeneous devices , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[37]  Vinod M. Prabhakaran,et al.  Decentralized erasure codes for distributed networked storage , 2006, IEEE Transactions on Information Theory.

[38]  Jay J. Wylie,et al.  Determining Fault Tolerance of XOR-Based Erasure Codes Efficiently , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[39]  Daniel A. Spielman,et al.  Practical loss-resilient codes , 1997, STOC '97.

[40]  Jörg Widmer,et al.  Network coding: an instant primer , 2006, CCRV.

[41]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[42]  Mary Baker,et al.  A fresh look at the reliability of long-term digital storage , 2005, EuroSys.

[43]  Randy H. Katz,et al.  Failure correction techniques for large disk arrays , 1989, ASPLOS III.

[44]  Walter A. Burkhard,et al.  Disk array storage system reliability , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[45]  James S. Plank The RAID-6 Liberation Codes , 2008, FAST.

[46]  Alexander Vardy,et al.  MDS array codes with independent parity symbols , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[47]  Ethan L. Miller,et al.  Optimizing Galois Field Arithmetic for Diverse Processor Architectures and Applications , 2008, 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems.

[48]  Dong Li,et al.  eRAID: Conserving Energy in Conventional Disk-Based RAID System , 2008, IEEE Transactions on Computers.

[49]  P. A. Wintz,et al.  Error Free Coding , 1973 .

[50]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[51]  Kevin M. Greenan,et al.  Reliability and power-efficiency in erasure-coded storage systems , 2009 .