Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications

In the past few years, all manner of storage systems, ranging from disk array systems to distributed and widearea systems, have started to grapple with the reality of tolerating multiple simultaneous failures of storage nodes. Unlike the single failure case, which is optimally handled with RAID Level-5 parity, the multiple failure case is more difficult because optimal general purpose strategies are not yet known. Erasure Codingis the field of research that deals with these strategies, and this field has blossomed in recent years. Despite this research, the decades-old strategy of Reed-Solomon coding remains the only space-optimal (MDS) code for all but the smallest storage systems. The best performing implementations of Reed-Solomon coding employ a variant called Cauchy Reed-Solomon coding, developed in the mid 1990’s [BKK 95]. In this paper, we present an improvement to Cauchy Reed-Solomon coding that is based on optimizing the Cauchy distribution matrix. We detail an algorithm for generating good matrices and then evaluate the performance of encoding using all manners of ReedSolomon coding, plus the best MDS codes from the literature. The improvements over the original Cauchy Reed-Solomon codes are as much as 83% in realistic scenarios, and average roughly 10% over all cases that we tested.

[1]  R. Chien,et al.  Error-Correcting Codes, Second Edition , 1973, IEEE Transactions on Communications.

[2]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[3]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[4]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[5]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[6]  Randy H. Katz,et al.  Patterson: "raid: high-performance, reliable secondary storage , 1994 .

[7]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[8]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[9]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[10]  Daniel A. Spielman,et al.  Practical loss-resilient codes , 1997, STOC '97.

[11]  Luigi Rizzo,et al.  Effective erasure codes for reliable computer communication protocols , 1997, CCRV.

[12]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[13]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[14]  Stephen B. Wicker,et al.  Reed-Solomon Codes and Their Applications , 1999 .

[15]  Michael Mitzenmacher,et al.  Accessing multiple mirror sites in parallel: using Tornado codes to speed up downloads , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[16]  Witold Litwin,et al.  LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes , 2000, SIGMOD '00.

[17]  David A. Bader,et al.  Facial Expression Recognition System using Statistical Feature and Neural Network , 2012 .

[18]  Ben Y. Zhao,et al.  Maintenance-Free Global Data Storage , 2001, IEEE Internet Comput..

[19]  Michael Luby,et al.  LT codes , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[20]  Micah Beck,et al.  Fault-tolerance in the network storage stack , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[21]  Zheng Zhang,et al.  Reperasure: replication protocol using erasure-code in peer-to-peer storage network , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[22]  Stephen B. Wicker,et al.  Fundamentals of Codes, Graphs, and Iterative Decoding , 2002 .

[23]  Michael K. Reiter,et al.  Efficient Byzantine-tolerant erasure-coded storage , 2004, International Conference on Dependable Systems and Networks, 2004.

[24]  Jin Li PeerStreaming: A Practical Receiver-Driven Peer-to-Peer Media Streaming System , 2004 .

[25]  Arif Merchant,et al.  A decentralized algorithm for erasure-coded virtual disks , 2004, International Conference on Dependable Systems and Networks, 2004.

[26]  James S. Plank,et al.  A practical analysis of low-density parity-check erasure codes for wide-area storage applications , 2004, International Conference on Dependable Systems and Networks, 2004.

[27]  D. M. Chiu,et al.  Erasure code replication revisited , 2004, Proceedings. Fourth International Conference on Peer-to-Peer Computing, 2004. Proceedings..

[28]  Michael Mitzenmacher,et al.  Digital fountains: a survey and look forward , 2004, Information Theory Workshop.

[29]  James S. Plank,et al.  Assessing the performance of erasure codes in the wide-area , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[30]  Ying Ding,et al.  Note: Correction to the 1997 tutorial on Reed–Solomon coding , 2005, Softw. Pract. Exp..

[31]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[32]  James S. Plank Enumeration of Optimal and Good Cauchy Matrices for Reed- Solomon Coding , 2005 .

[33]  Jérôme Lacan,et al.  Content-access QoS in peer-to-peer networks using a fast MDS erasure code , 2005, Comput. Commun..

[34]  James Lee Hafner,et al.  HoVer Erasure Codes For Disk Arrays , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[35]  Andrew A. Chien,et al.  RobuSTore: Robust Performance for Distributed Storage Systems , 2006 .

[36]  Amin Shokrollahi,et al.  Raptor codes , 2011, IEEE Transactions on Information Theory.