Repairing Reed-Solomon Codes With Multiple Erasures

Despite their exceptional error-correcting properties, Reed–Solomon (RS) codes have been overlooked in distributed storage applications due to the common belief that they have poor repair bandwidth. A naive repair approach would require for the whole file to be reconstructed in order to recover a single erased codeword symbol. In a recent work, Guruswami and Wootters (STOC’16) proposed a single erasure repair method for RS codes that achieves the optimal repair bandwidth amongst all linear encoding schemes. Their key idea is to recover the erased symbol by collecting a sufficiently large number of its traces, each of which can be constructed from a number of traces of other symbols. We extend the trace collection technique to cope with two and three erasures.

[1]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[2]  Jason Cong,et al.  Atlas: Baidu's key-value storage system for cloud data , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[3]  Kenneth W. Shum,et al.  Repairing multiple failures in the Suh-Ramchandran regenerating codes , 2013, 2013 IEEE International Symposium on Information Theory.

[4]  Stéphane Pérennes,et al.  Peer-to-Peer Storage Systems: A Practical Guideline to be Lazy , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[5]  Baochun Li,et al.  Cooperative repair with minimum-storage regenerating codes for distributed storage , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[6]  Kenneth W. Shum,et al.  Cooperative Regenerating Codes , 2012, IEEE Transactions on Information Theory.

[7]  Chi Wan Sung,et al.  Broadcast repair for wireless distributed storage systems , 2015, 2015 10th International Conference on Information, Communications and Signal Processing (ICICS).

[8]  Venkatesan Guruswami,et al.  Repairing Reed-Solomon Codes , 2015, IEEE Transactions on Information Theory.

[9]  Pei Li,et al.  Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding , 2010, IEEE Journal on Selected Areas in Communications.

[10]  Chao Tian,et al.  Distributed storage evaluation on a three-wide inter-data center deployment , 2013, 2013 IEEE International Conference on Big Data.

[11]  Hoang Dau,et al.  Optimal repair schemes for some families of full-length reed-solomon codes , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[12]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[13]  Jehoshua Bruck,et al.  MDS array codes with optimal rebuilding , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[14]  Dimitris S. Papailiopoulos,et al.  A Repair Framework for Scalar MDS Codes , 2014, IEEE Journal on Selected Areas in Communications.

[15]  H. Niederreiter,et al.  Introduction to finite fields and their applications: Factorization of Polynomials , 1994 .

[16]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[17]  Nicolas Le Scouarnec Exact scalar minimum storage coordinated regenerating codes , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[18]  Itzhak Tamo,et al.  Optimal Repair of Reed-Solomon Codes: Achieving the Cut-Set Bound , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[19]  Frédérique Oggier,et al.  Self-repairing homomorphic codes for distributed storage systems , 2010, 2011 Proceedings IEEE INFOCOM.

[20]  Mario Blaum,et al.  A Tale of Two Erasure Codes in HDFS , 2015, FAST.

[21]  Jehoshua Bruck,et al.  Zigzag Codes: MDS Array Codes With Optimal Rebuilding , 2011, IEEE Transactions on Information Theory.

[22]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[23]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[24]  Kenneth W. Shum,et al.  Cooperative repair of multiple node failures in distributed storage systems , 2016, Int. J. Inf. Coding Theory.

[25]  Kannan Ramchandran,et al.  A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[26]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .

[27]  Alexander Barg,et al.  Explicit Constructions of High-Rate MDS Array Codes With Optimal Repair Bandwidth , 2016, IEEE Transactions on Information Theory.

[28]  Cheng Huang,et al.  On the Locality of Codeword Symbols , 2011, IEEE Transactions on Information Theory.

[29]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[30]  Han Mao Kiah,et al.  Repairing reed-solomon codes with two erasures , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[31]  Hoang Dau,et al.  Low bandwidth repair of the RS(10,4) Reed-Solomon code , 2017, 2017 Information Theory and Applications Workshop (ITA).

[32]  Kannan Ramchandran,et al.  Asymptotic Interference Alignment for Optimal Repair of MDS Codes in Distributed Storage , 2013, IEEE Transactions on Information Theory.

[33]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[34]  Alexander Barg,et al.  Repairing Reed-Solomon codes: Universally achieving the cut-set bound for any number of erasures , 2017, ArXiv.

[35]  Yinlong Xu,et al.  MFR: Multi-Loss Flexible Recovery in Distributed Storage Systems , 2010, 2010 IEEE International Conference on Communications.

[36]  Roberto Padovani,et al.  Liquid Cloud Storage , 2017, ACM Trans. Storage.

[37]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[38]  Mary Wootters,et al.  Repairing multiple failures for scalar MDS codes , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[39]  Alexander Barg,et al.  Explicit constructions of MDS array codes and RS codes with optimal repair bandwidth , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[40]  Jehoshua Bruck,et al.  Optimal Rebuilding of Multiple Erasures in MDS Codes , 2016, IEEE Transactions on Information Theory.

[41]  Dimitris S. Papailiopoulos,et al.  Locally Repairable Codes , 2012, IEEE Transactions on Information Theory.

[42]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[43]  Kannan Ramchandran,et al.  Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions , 2010, IEEE Transactions on Information Theory.