Repairing Reed-Solomon Codes

We study the performance of Reed–Solomon (RS) codes for the <italic>exact repair problem</italic> in distributed storage. Our main result is that, in some parameter regimes, Reed–Solomon codes are optimal regenerating codes, among maximum distance separable (MDS) codes with linear repair schemes. Moreover, we give a characterization of MDS codes with linear repair schemes, which holds in any parameter regime, and which can be used to give non-trivial repair schemes for RS codes in other settings. More precisely, we show that for <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-dimensional RS codes whose evaluation points are a finite field of size <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>, there are exact repair schemes with bandwidth <inline-formula> <tex-math notation="LaTeX">$(n-1)\log ((n-1)/(n-k))$ </tex-math></inline-formula> bits, and that this is optimal for any MDS code with a linear repair scheme. In contrast, the naive (commonly implemented) repair algorithm for this RS code has bandwidth <inline-formula> <tex-math notation="LaTeX">$k\log (n)$ </tex-math></inline-formula> bits. When the entire field is used as evaluation points, the number of nodes <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> is much larger than the number of bits per node (which is <inline-formula> <tex-math notation="LaTeX">$O(\log (n))$ </tex-math></inline-formula>), and so this result holds only when the degree of sub-packetization is small. However, our method applies in any parameter regime, and to illustrate this for high levels of sub-packetization, we give an improved repair scheme for a specific (14,10)-RS code used in the facebook hadoop analytics cluster.

[1]  Itzhak Tamo,et al.  Bounds on locally recoverable codes with multiple recovering sets , 2014, 2014 IEEE International Symposium on Information Theory.

[2]  Cheng Huang,et al.  Polynomial length MDS codes with optimal repair in distributed storage , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[3]  Jehoshua Bruck,et al.  Long MDS codes for optimal repair bandwidth , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[4]  Jehoshua Bruck,et al.  Access Versus Bandwidth in Codes for Storage , 2014, IEEE Transactions on Information Theory.

[5]  Kannan Ramchandran,et al.  Exact-repair MDS codes for distributed storage using interference alignment , 2010, 2010 IEEE International Symposium on Information Theory.

[6]  Yunghsiang Sam Han,et al.  Exact regenerating codes for Byzantine fault tolerance in distributed storage , 2012, 2012 Proceedings IEEE INFOCOM.

[7]  Kannan Ramchandran,et al.  Explicit construction of optimal exact regenerating codes for distributed storage , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  Y. Han,et al.  Efficient Exact Regenerating Codes for Byzantine Fault Tolerance in Distributed Networked Storage , 2014, IEEE Transactions on Communications.

[9]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[10]  Alexander Barg,et al.  Explicit Constructions of Optimal-Access MDS Codes With Nearly Optimal Sub-Packetization , 2016, IEEE Transactions on Information Theory.

[11]  Itzhak Tamo,et al.  A Family of Optimal Locally Recoverable Codes , 2013, IEEE Transactions on Information Theory.

[12]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[13]  Kannan Ramchandran,et al.  Asymptotic Interference Alignment for Optimal Repair of MDS Codes in Distributed Storage , 2013, IEEE Transactions on Information Theory.

[14]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[15]  Jehoshua Bruck,et al.  Zigzag Codes: MDS Array Codes With Optimal Rebuilding , 2011, IEEE Transactions on Information Theory.

[16]  Cheng Huang,et al.  Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[17]  Kannan Ramchandran,et al.  On the Existence of Optimal Exact-Repair MDS Codes for Distributed Storage , 2010, ArXiv.

[18]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[21]  A. Robert Calderbank,et al.  An Improved Sub-Packetization Bound for Minimum Storage Regenerating Codes , 2013, IEEE Transactions on Information Theory.

[22]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[23]  Alexander Barg,et al.  Explicit constructions of MDS array codes and RS codes with optimal repair bandwidth , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[24]  Dimitris S. Papailiopoulos,et al.  Optimal locally repairable codes and connections to matroid theory , 2013, 2013 IEEE International Symposium on Information Theory.

[25]  P. Vijay Kumar,et al.  A high-rate MSR code with polynomial sub-packetization level , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[26]  Dimitris S. Papailiopoulos,et al.  Repair optimal erasure codes through hadamard designs , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Cheng Huang,et al.  Optimal Repair of MDS Codes in Distributed Storage via Subspace Interference Alignment , 2011, ArXiv.

[28]  Yunnan Wu,et al.  Reducing repair traffic for erasure coding-based storage via interference alignment , 2009, 2009 IEEE International Symposium on Information Theory.

[29]  Yunghsiang Sam Han,et al.  Update-efficient regenerating codes with minimum per-node storage , 2013, 2013 IEEE International Symposium on Information Theory.

[30]  W. Marsden I and J , 2012 .

[31]  Dimitris S. Papailiopoulos,et al.  A repair framework for scalar MDS codes , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Wentao Huang,et al.  Communication Efficient Secret Sharing , 2015, IEEE Transactions on Information Theory.

[33]  Kannan Ramchandran,et al.  Explicit codes minimizing repair bandwidth for distributed storage , 2009, 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo).