Rebuilding for array codes in distributed storage systems

In distributed storage systems that use coding, the issue of minimizing the communication required to rebuild a storage node after a failure arises. We consider the problem of repairing an erased node in a distributed storage system that uses an EVENODD code. EVENODD codes are maximum distance separable (MDS) array codes that are used to protect against erasures, and only require XOR operations for encoding and decoding. We show that when there are two redundancy nodes, to rebuild one erased systematic node, only 3/4 of the information needs to be transmitted. Interestingly, in many cases, the required disk I/O is also minimized.

[1]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[2]  John C. S. Lui,et al.  Optimal recovery of single disk failure in RDP code storage systems , 2010, SIGMETRICS '10.

[3]  Alexandros G. Dimakis,et al.  Searching for Minimum Storage Regenerating Codes , 2009, ArXiv.

[4]  Alexander Vardy,et al.  MDS array codes with independent parity symbols , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[5]  Yunnan Wu Existence and construction of capacity-achieving network codes for distributed storage , 2009, 2009 IEEE International Symposium on Information Theory.

[6]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[7]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[8]  Yunnan Wu,et al.  Reducing repair traffic for erasure coding-based storage via interference alignment , 2009, 2009 IEEE International Symposium on Information Theory.

[9]  Jehoshua Bruck,et al.  Low density MDS codes and factors of complete graphs , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[10]  Kannan Ramchandran,et al.  Exact Regeneration Codes for Distributed Storage Repair Using Interference Alignment , 2009, ArXiv.

[11]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[12]  Yunnan Wu Existence and Construction of Capacity-Achieving Network Codes for Distributed Storage , 2010, IEEE Journal on Selected Areas in Communications.

[13]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[14]  Kannan Ramchandran,et al.  Explicit construction of optimal exact regenerating codes for distributed storage , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[16]  Syed Ali Jafar,et al.  Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient , 2010, ArXiv.

[17]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.