Cooperative pipelined regeneration in distributed storage systems

In distributed storage systems, a substantial volume of data are stored in a distributed fashion, across a large number of storage nodes. To maintain data integrity, when existing storage nodes fail, lost data are regenerated at replacement nodes. Regenerating multiple data losses in batches can reduce the consumption of bandwidth. However, existing schemes are only able to achieve lower bandwidth consumption by utilizing a large number of participating nodes. In this paper, we propose a cooperative pipelined regeneration process that regenerates multiple data losses cooperatively with much fewer participating nodes. We show that cooperative pipelined regeneration is not only able to maintain optimal data integrity, but also able to further reduce the consumption of bandwidth as well.

[1]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[2]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[3]  Kenneth W. Shum Cooperative Regenerating Codes for Distributed Storage Systems , 2011, 2011 IEEE International Conference on Communications (ICC).

[4]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[5]  Kannan Ramchandran,et al.  Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions , 2010, IEEE Transactions on Information Theory.

[6]  Ernst W. Biersack,et al.  A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[7]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[8]  R. Koetter,et al.  The benefits of coding over routing in a randomized setting , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[9]  Dimitris S. Papailiopoulos,et al.  Simple regenerating codes: Network coding for cloud storage , 2011, 2012 Proceedings IEEE INFOCOM.

[10]  Frédérique E. Oggier,et al.  Self-Repairing Codes for distributed storage — A projective geometric construction , 2011, 2011 IEEE Information Theory Workshop.

[11]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[12]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[13]  Frédérique Oggier,et al.  Self-repairing homomorphic codes for distributed storage systems , 2010, 2011 Proceedings IEEE INFOCOM.

[14]  Ernst W. Biersack,et al.  Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems , 2008, 2008 Eighth International Conference on Peer-to-Peer Computing.

[15]  Kannan Ramchandran,et al.  On the Existence of Optimal Exact-Repair MDS Codes for Distributed Storage , 2010, ArXiv.

[16]  GhemawatSanjay,et al.  The Google file system , 2003 .

[17]  Xin Wang,et al.  Pipelined Regeneration with Regenerating Codes for Distributed Storage Systems , 2011, 2011 International Symposium on Networking Coding.

[18]  Muriel Medard,et al.  How good is random linear coding based distributed networked storage , 2005 .

[19]  Anne-Marie Kermarrec,et al.  Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes , 2011, 2011 International Symposium on Networking Coding.

[20]  Pei Li,et al.  Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding , 2010, IEEE Journal on Selected Areas in Communications.

[21]  Syed Ali Jafar,et al.  Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient , 2010, ArXiv.

[22]  Kannan Ramchandran,et al.  Fractional repetition codes for repair in distributed storage systems , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).