Network Coding for Distributed Storage Systems

Peer-to-peer distributed storage systems provide reliable access to data through redundancy spread over nodes across the Internet. A key goal is to minimize the amount of bandwidth used to maintain that redundancy. Storing a file using an erasure code, in fragments spread across nodes, promises to require less redundancy and hence less maintenance bandwidth than simple replication to provide the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate a new fragment in a distributed way while transferring as little data as possible across the network. In this paper, we introduce a general technique to analyze storage architectures that combine any form of coding and replication, as well as presenting two new schemes for maintaining redundancy using erasure codes. First, we show how to optimally generate MDS fragments directly from existing fragments in the system. Second, we introduce a new scheme called regenerating codes which use slightly larger fragments than MDS but have lower overall bandwidth use. We also show through simulation that in realistic environments, regenerating codes can reduce maintenance bandwidth use by 25% or more compared with the best previous design - a hybrid of replication and erasure codes - while simplifying system architecture.

[1]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[2]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[3]  趙志宏 Network Coding for Large Scale Content Distribution , 2005 .

[4]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[5]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[6]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[7]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[8]  Peter Sanders,et al.  Polynomial time algorithms for network information flow , 2003, SPAA '03.

[9]  G. Neglia,et al.  On the Benefits of Random Linear Coding for Unicast Applications in Disruption Tolerant Networks , 2006, 2006 4th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks.

[10]  Robert Tappan Morris,et al.  Designing a DHT for Low Latency and High Throughput , 2004, NSDI.

[11]  Tracey Ho,et al.  Network Coding: An Introduction , 2008 .

[12]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[13]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[14]  Tracey Ho,et al.  A Random Linear Network Coding Approach to Multicast , 2006, IEEE Transactions on Information Theory.

[15]  Michael Luby,et al.  LT codes , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[16]  Muriel Médard,et al.  An algebraic approach to network coding , 2003, TNET.

[17]  Chuan Wu,et al.  Echelon: Peer-to-Peer Network Diagnosis with Network Coding , 2006, 200614th IEEE International Workshop on Quality of Service.

[18]  Anxiao Jiang Network Coding for Joint Storage and Transmission with Minimum Cost , 2006, 2006 IEEE International Symposium on Information Theory.

[19]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[20]  Ben Y. Zhao,et al.  Maintenance-Free Global Data Storage , 2001, IEEE Internet Comput..

[21]  Geoffrey M. Voelker,et al.  On Object Maintenance in Peer-to-Peer Systems , 2006, IPTPS.

[22]  Sidharth Jaggi,et al.  Polynomial time algorithms for network code construction , 2005 .

[23]  Ravi Jain,et al.  An Experimental Study of the Skype Peer-to-Peer VoIP System , 2005, IPTPS.

[24]  J. Kubiatowicz,et al.  Long-Term Data Maintenance in Wide-Area Storage Systems : A Quantitative Approach , 2005 .

[25]  Vinod M. Prabhakaran,et al.  Decentralized erasure codes for distributed networked storage , 2006, IEEE Transactions on Information Theory.

[26]  Qian Zhang,et al.  Partial Network Coding: Theory and Application for Continuous Sensor Data Collection , 2006, 200614th IEEE International Workshop on Quality of Service.

[27]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[28]  Brighten Godfrey,et al.  Minimizing churn in distributed systems , 2006, SIGCOMM.

[29]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[30]  Peter Sanders,et al.  Polynomial time algorithms for multicast network code construction , 2005, IEEE Transactions on Information Theory.

[31]  Jon Feldman,et al.  Growth codes: maximizing sensor network data persistence , 2006, SIGCOMM.

[32]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[33]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[34]  Christos Gkantsidis,et al.  Anatomy of a P2P Content Distribution system with Network Coding , 2006, IPTPS.

[35]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[36]  Jon Feldman,et al.  Growth codes: maximizing sensor network data persistence , 2006, SIGCOMM 2006.

[37]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[38]  DruschelPeter,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001 .

[39]  Vinod M. Prabhakaran,et al.  Ubiquitous access to distributed data in large-scale sensor networks through decentralized erasure codes , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[40]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[41]  Daniel A. Spielman,et al.  Improved low-density parity-check codes using irregular graphs and belief propagation , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[42]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.

[43]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[44]  Gregory Gutin,et al.  Digraphs - theory, algorithms and applications , 2002 .

[45]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[46]  Ira Pramanick,et al.  High Availability , 2001, Int. J. High Perform. Comput. Appl..

[47]  Peter Druschel,et al.  Storage management and caching in PAST , 2001 .

[48]  Jörg Widmer,et al.  Network coding: an instant primer , 2006, CCRV.