On the implementation of Zigzag codes for distributed storage system

Erasure codes such as Reed-Solomon (RS) codes are widely used to improve data reliability in distributed storage systems. Although erasure codes indeed greatly reduce the storage overhead compared to the replication schemes, it is still very costly in terms of network bandwidth when repairing a failed node. To address such problem, we employ the Zigzag code, a MDS array code with optimal repair property, in the practical system. Specifically, we first build a general system on Hadoop to evaluate the encoding, decoding and repair performance of different codes, and then implement Zigzag codes on our system. The experimental results show that the Zigzag codes coincide with the theoretical findings and has certain advantages. Compared to current HDFS modules that use RS codes, our Zigzag based HDFS implementation shows significant reduction of repair disk I/O and repair bandwidth with the same computation complexity.

[1]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[2]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[3]  Jehoshua Bruck,et al.  On codes for optimal rebuilding access , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Yunnan Wu,et al.  Network coding for distributed storage systems , 2010, IEEE Trans. Inf. Theory.

[5]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[6]  Peter F. Corbett,et al.  Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction , 2004 .

[7]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[8]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[9]  T. Wassmer 6 , 1900, EXILE.

[10]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[11]  Jehoshua Bruck,et al.  Zigzag Codes: MDS Array Codes With Optimal Rebuilding , 2011, IEEE Transactions on Information Theory.

[12]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.