Cooperative repair with minimum-storage regenerating codes for distributed storage

Distributed storage systems store redundant data to tolerate failures of storage nodes and lost data should be repaired when storage nodes fail. A class of MDS codes, called minimum-storage regenerating (MSR) codes, has been designed to optimize bandwidth consumption when repairing one single failure. Compared with repairing failures individually, the cooperative repair of multiple failures can help to further save bandwidth consumption when multiple failures are being repaired. In this paper, we present a new construction of minimum-storage cooperative regenerating (MSCR) codes that repair two failures cooperatively and exactly. We show that given a valid instance of linear exact MSR codes, we are able to construct a corresponding repair procedure to repair any two failures cooperatively with optimal bandwidth consumption, i.e., to construct an instance of exact MSCR codes directly from exact MSR codes. With this connection, we are also able to repair any single failure exactly with MSCR codes.

[1]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[2]  Anne-Marie Kermarrec,et al.  Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes , 2011, 2011 International Symposium on Networking Coding.

[3]  Pei Li,et al.  Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding , 2010, IEEE Journal on Selected Areas in Communications.

[4]  Kenneth W. Shum Cooperative Regenerating Codes for Distributed Storage Systems , 2011, 2011 IEEE International Conference on Communications (ICC).

[5]  Zhifang Zhang,et al.  Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage , 2013, 2013 Proceedings IEEE INFOCOM.

[6]  Yunnan Wu,et al.  Reducing repair traffic for erasure coding-based storage via interference alignment , 2009, 2009 IEEE International Symposium on Information Theory.

[7]  Syed Ali Jafar,et al.  Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient , 2010, ArXiv.

[8]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[9]  Kannan Ramchandran,et al.  Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions , 2010, IEEE Transactions on Information Theory.

[10]  Yunghsiang Sam Han,et al.  Update-efficient regenerating codes with minimum per-node storage , 2013, 2013 IEEE International Symposium on Information Theory.

[11]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[12]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[13]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[14]  Kenneth W. Shum,et al.  Repairing multiple failures in the Suh-Ramchandran regenerating codes , 2013, 2013 IEEE International Symposium on Information Theory.

[15]  Yunnan Wu Existence and construction of capacity-achieving network codes for distributed storage , 2009, 2009 IEEE International Symposium on Information Theory.

[16]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[17]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[18]  Kannan Ramchandran,et al.  Distributed Storage Codes With Repair-by-Transfer and Nonachievability of Interior Points on the Storage-Bandwidth Tradeoff , 2010, IEEE Transactions on Information Theory.

[19]  Titu Andreescu,et al.  Mathematical Olympiad Challenges , 2000 .

[20]  Nicolas Le Scouarnec Exact scalar minimum storage coordinated regenerating codes , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[21]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.