A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems

In distributed storage systems, erasure codes represent an attractive solution to add redundancy to stored data while limiting the storage overhead. They are able to provide the same reliability as replication requiring much less storage space. Erasure coding breaks the data into pieces that are encoded and then stored on different nodes. However, when storage nodes permanently abandon the system, new redundant pieces must be created. For erasure codes, generating a new piece requires the transmission of k pieces over the network, resulting in a k times higher reconstruction traffic as compared to replication. Dimakis proposed a new class of codes, called Regenerating Codes, which are able to provide both the storage efficiency of erasure codes and the communication efficiency of replication. However, Dimakis gave only a theoretical description of the codes without discussing implementation issues or computational costs. We have done a real implementation of Random Linear Regenerating Codes that allows us to measure their computational cost, which can be significant if the parameters are not chosen properly. However, we also find that there exist parameter values that result in a significant reduction of the communication overhead at the expense of a small increase in storage cost and computation, which makes these codes very attractive for distributed storage systems.

[1]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[2]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[3]  Ernst W. Biersack,et al.  Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems , 2008, 2008 Eighth International Conference on Peer-to-Peer Computing.

[4]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[5]  Shuo-Yen Robert Li,et al.  Linear network coding , 2003, IEEE Trans. Inf. Theory.

[6]  John Kubiatowicz,et al.  Design and evaluation of distributed wide-area on-line archival storage systems , 2006 .

[7]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[8]  Muriel Medard,et al.  How good is random linear coding based distributed networked storage , 2005 .

[9]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[10]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[11]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[12]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[13]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.