Erasure Codes with Small Overhead Factor and Their Distributed Storage Applications

In this paper, we consider a family of XOR-based erasure codes with finite-sized randomly-generated parity check matrices, and report the results of thorough computational search for suitable erasure codes for distributed storage applications. Although the discovered matrices are not "low density" and the resulting codes are only approximately maximum distance separable (MDS) codes, they have performance advantages over other codes, such as LDPC and IRA (irregular repeat-accumulate) codes, in terms of the overhead factor, that is, the average ratio of the total amount of encoded file blocks for restoring lost blocks to the amount of original file blocks. We designed our codes so that the overhead factor becomes small. While typical LDPC codes use matrices that have several thousand rows, our codes use matrices that have only one thousand rows in consideration of practicable operation time and overhead. Because a method for discovering the most suitable matrix from a large number of matrices has not been found, we executed Monte Carlo simulation for a long time in order to discover a suitable matrix with the lowest overhead factor. We have discovered a family of erasure codes with an overhead factor of 1.002 on average, compared to 1.07 for typical LDPC codes when the number of rows is 1000.

[1]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[2]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[3]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[4]  Michael Mitzenmacher,et al.  Analysis of random processes via And-Or tree evaluation , 1998, SODA '98.

[5]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[6]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[7]  James S. Plank,et al.  Small parity-check erasure codes - exploration and observations , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[8]  James Lee Hafner,et al.  HoVer Erasure Codes For Disk Arrays , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[9]  J. Plank,et al.  On the Practical Use of LDPC Erasure Codes for Distributed Storage Applications , 2003 .

[10]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[11]  Tal Rabin,et al.  Secure distributed storage and retrieval , 1997, Theor. Comput. Sci..

[12]  Hugo Krawczyk Distributed fingerprints and secure information dispersal , 1993, PODC '93.