The Repair Problem for Reed–Solomon Codes: Optimal Repair of Single and Multiple Erasures With Almost Optimal Node Size

The repair problem in distributed storage addresses recovery of the data encoded using an erasure code, for instance, a Reed–Solomon (RS) code. We consider the problem of repairing a single node or multiple nodes in RS-coded storage systems using the smallest possible amount of inter-nodal communication. According to the cut-set bound, communication cost of repairing <inline-formula> <tex-math notation="LaTeX">$h\geqslant 1$ </tex-math></inline-formula> failed nodes for an <inline-formula> <tex-math notation="LaTeX">$(n,k=n-r)$ </tex-math></inline-formula> maximum distance separable (MDS) code using <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> helper nodes is at least <inline-formula> <tex-math notation="LaTeX">$dhl/(d+h-k)$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula> is the size of the node. Guruswami and Wootters (2016) initiated the study of efficient repair of RS codes, showing that they can be repaired using a smaller bandwidth than under the trivial approach. At the same time, their work as well as follow-up papers stopped short of constructing RS codes (or any scalar MDS codes) that meet the cut-set bound with equality. In this paper, we construct the families of RS codes that achieve the cut-set bound for repair of one or several nodes. In the single-node case, we present the RS codes of length <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> over the field <inline-formula> <tex-math notation="LaTeX">${\mathbb F}_{q^{l}}, l=\exp ((1+o(1))n\log n)$ </tex-math></inline-formula> that meet the cut-set bound. We also prove an almost matching lower bound on <inline-formula> <tex-math notation="LaTeX">$l$ </tex-math></inline-formula>, showing that super-exponential scaling is both necessary and sufficient for scalar MDS codes to achieve the cut-set bound using linear repair schemes. For the case of multiple nodes, we construct a family of RS codes that achieve the cut-set bound universally for the repair of any <inline-formula> <tex-math notation="LaTeX">$h=1,2, {\dots },r$ </tex-math></inline-formula> failed nodes from any subset of <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> helper nodes, <inline-formula> <tex-math notation="LaTeX">$k\leqslant d\leqslant n-h$ </tex-math></inline-formula>. For a fixed number of parities <inline-formula> <tex-math notation="LaTeX">$r$ </tex-math></inline-formula>, the node size of the constructed codes is close to the smallest possible node size for codes with such properties.

[1]  Kenneth W. Shum,et al.  Cooperative Regenerating Codes , 2012, IEEE Transactions on Information Theory.

[2]  Baochun Li,et al.  Cooperative repair with minimum-storage regenerating codes for distributed storage , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[3]  Sriram Vishwanath,et al.  Centralized Repair of Multiple Node Failures With Applications to Communication Efficient Secret Sharing , 2016, IEEE Transactions on Information Theory.

[4]  Anne-Marie Kermarrec,et al.  Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes , 2011, 2011 International Symposium on Networking Coding.

[5]  Zhiying Wang,et al.  Centralized multi-node repair for minimum storage regenerating codes , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[6]  Arman Fazeli,et al.  Minimum Storage Regenerating Codes for All Parameters , 2017, IEEE Transactions on Information Theory.

[7]  Natalia Silberstein,et al.  Constructions of high-rate minimum storage regenerating codes over small fields , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[8]  Han Mao Kiah,et al.  Repairing Reed-Solomon Codes With Multiple Erasures , 2016, IEEE Transactions on Information Theory.

[9]  Dimitris S. Papailiopoulos,et al.  A repair framework for scalar MDS codes , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Balaji Srinivasan Babu,et al.  A Tight Lower Bound on the Sub- Packetization Level of Optimal-Access MSR and MDS Codes , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[11]  Cheng Huang,et al.  Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[12]  Jehoshua Bruck,et al.  Explicit Minimum Storage Regenerating Codes , 2016, IEEE Transactions on Information Theory.

[13]  Harald Niederreiter,et al.  Introduction to finite fields and their applications: Preface , 1994 .

[14]  Alexander Vardy,et al.  Improved schemes for asymptotically optimal repair of MDS codes , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[16]  Abdel R. El Gamal,et al.  On information flow in relay networks , 1981 .

[17]  Itzhak Tamo,et al.  Optimal Repair of Reed-Solomon Codes: Achieving the Cut-Set Bound , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[18]  Mary Wootters,et al.  Repairing Multiple Failures for Scalar MDS Codes , 2019, IEEE Trans. Inf. Theory.

[19]  Kannan Ramchandran,et al.  Asymptotic Interference Alignment for Optimal Repair of MDS Codes in Distributed Storage , 2013, IEEE Transactions on Information Theory.

[20]  P. Vijay Kumar,et al.  An Explicit, Coupled-Layer Construction of a High-Rate MSR Code with Low Sub-Packetization Level, Small Field Size and All-Node Repair , 2016, ArXiv.

[21]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[22]  A. Robert Calderbank,et al.  An Improved Sub-Packetization Bound for Minimum Storage Regenerating Codes , 2013, IEEE Transactions on Information Theory.

[23]  Alexander Barg,et al.  Explicit Constructions of High-Rate MDS Array Codes With Optimal Repair Bandwidth , 2016, IEEE Transactions on Information Theory.

[24]  Yuan Luo,et al.  Repairing Algebraic Geometry Codes , 2018, IEEE Transactions on Information Theory.

[25]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[26]  Jehoshua Bruck,et al.  Zigzag Codes: MDS Array Codes With Optimal Rebuilding , 2011, IEEE Transactions on Information Theory.

[27]  Alexander Barg,et al.  Explicit constructions of MDS array codes and RS codes with optimal repair bandwidth , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[28]  Jehoshua Bruck,et al.  Optimal Rebuilding of Multiple Erasures in MDS Codes , 2016, IEEE Transactions on Information Theory.

[29]  Alexander Barg,et al.  Explicit Constructions of Optimal-Access MDS Codes With Nearly Optimal Sub-Packetization , 2016, IEEE Transactions on Information Theory.

[30]  Hoang Dau,et al.  Optimal repair schemes for some families of full-length reed-solomon codes , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[31]  Alexander Barg,et al.  Cooperative Repair: Constructions of Optimal MDS Codes for All Admissible Parameters , 2018, IEEE Transactions on Information Theory.

[32]  Venkatesan Guruswami,et al.  MDS code constructions with small sub-packetization and near-optimal repair bandwidth , 2017, SODA 2017.

[33]  H. Iwaniec,et al.  Analytic Number Theory , 2004 .