Proactive Replication for Data Durability

Many wide-area storage systems replicate data for durability. A common way of maintaining the replicas is to detect node failures and respond by creating additional copies of objects that were stored on failed nodes and hence suffered a loss of redundancy. Reactive techniques can minimize total bytes sent since they only create replicas as needed; however, they can create spikes in network use after a failure. These spikes may overwhelm application traffic and can make it difficult to provision bandwidth. This paper explores a proactive approach that creates additional copies not in response to failures, but periodically at a fixed low rate. We introduce Tempo, a distributed hash table that allows each user to specify a maximum maintenance bandwidth and uses it to perform proactive replication. Results from a simulation study suggest that Tempo can deliver high durability despite only using several kilobytes per second of bandwidth, comparable to state-ofthe-art reactive systems.

[1]  Emin Gün Sirer,et al.  Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays , 2004, NSDI.

[2]  Robert Morris,et al.  A distributed hash table , 2006 .

[3]  Brighten Godfrey,et al.  OpenDHT: a public DHT service and its uses , 2005, SIGCOMM '05.

[4]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[5]  Miguel Castro,et al.  Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.

[6]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[7]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[8]  David R. Karger,et al.  OverCite: A Cooperative Digital Research Library , 2005, IPTPS.

[9]  Geoffrey M. Voelker,et al.  On Object Maintenance in Peer-to-Peer Systems , 2006, IPTPS.

[10]  Robert Tappan Morris,et al.  Designing a DHT for Low Latency and High Throughput , 2004, NSDI.

[11]  Robert Tappan Morris,et al.  Bandwidth-efficient management of DHT routing tables , 2005, NSDI.

[12]  Marius A. Eriksen,et al.  Trickle: A Userland Bandwidth Shaper for UNIX-like Systems , 2005, USENIX Annual Technical Conference, FREENIX Track.

[13]  Joseph Pasquale,et al.  Analysis of Long-Running Replicated Systems , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  M. Dahlin,et al.  TCP Nice: a mechanism for background transfers , 2002, OSDI '02.

[15]  Kirk L. Johnson,et al.  Overcast: reliable multicasting with on overlay network , 2000, OSDI.

[16]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[17]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[18]  Liuba Shrira,et al.  The design of a robust peer-to-peer system , 2002, EW 10.

[19]  KyoungSoo Park,et al.  CoMon: a mostly-scalable monitoring system for PlanetLab , 2006, OPSR.

[20]  David E. Culler,et al.  A blueprint for introducing disruptive technology into the Internet , 2003, CCRV.

[21]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[22]  J. Kubiatowicz,et al.  Long-Term Data Maintenance in Wide-Area Storage Systems : A Quantitative Approach , 2005 .

[23]  Andreas Haeberlen,et al.  Experiences in building and operating ePOST, a reliable peer-to-peer application , 2006, EuroSys '06.

[24]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[25]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[26]  G. Cox,et al.  ~ " " " ' l I ~ " " -" . : -· " J , 2006 .