Near-optimal dynamic replication in unstructured peer-to-peer networks

Replicating data in distributed systems is often needed for availability and performance. In unstructured peer-to-peer networks, with epidemic messaging for query routing, replicating popular data items is also crucial to ensure high probability of finding the data within a bounded search distance from the requestor. This paper considers such networks and aims to maximize the probability of successful search. Prior work along these lines has analyzed the optimal degrees of replication for data items with non-uniform but global request rates, but did not address the issue of where replicas should be placed and was very very limited in the capabilities for handling heterogeneity and dynamics of network and workload. This paper presents the integrated P2R2 algorithm for dynamic replication that addresses all these issues, and determines both the degrees of replication and the placement of the replicas in a provably near-optimal way. We prove that the P2R2 algorithm can guarantee a successful-search probability that is within a factor of 2 of the optimal solution. The algorithm is efficient and can handle workload evolution. We prove that, whenever the access patterns are in steady state, our algorithm converges to the desired near-optimal placement. We further show by simulations that the convergence rate is fast and that our algorithm outperforms prior methods.

[1]  Sudipto Guha,et al.  Improved algorithms for the data placement problem , 2002, SODA '02.

[2]  Erik D. Demaine,et al.  EpiChord: parallelizing the chord lookup algorithm with reactive routing state management , 2004, Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955).

[3]  Klaus Wehrle,et al.  Peer-to-Peer Systems and Applications , 2005, Peer-to-Peer Systems and Applications.

[4]  Eugene L. Lawler,et al.  Parameterized Approximation Scheme for the Multiple Knapsack Problem , 2009, SIAM J. Comput..

[5]  Klaus Wehrle,et al.  Peer-to-Peer Systems and Applications (Lecture Notes in Computer Science) , 2005 .

[6]  Chaitanya Swamy,et al.  Facility location with Service Installation Costs , 2004, SODA '04.

[7]  Randolph Nelson,et al.  Probability, Stochastic Processes, and Queueing Theory , 1995 .

[8]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[9]  Anne-Marie Kermarrec,et al.  Epidemic information dissemination in distributed systems , 2004, Computer.

[10]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[11]  Ophir Frieder,et al.  A Tool for Information Retrieval Research in Peer-to-Peer File Sharing Systems , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[13]  Emin Gün Sirer,et al.  Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays , 2004, NSDI.

[14]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[15]  Robert E. Tarjan,et al.  The pairing heap: A new form of self-adjusting heap , 2005, Algorithmica.

[16]  Sugih Jamin,et al.  Inet-3.0: Internet Topology Generator , 2002 .

[17]  Aravind Srinivasan,et al.  Efficient lookup on unstructured topologies , 2005, IEEE Journal on Selected Areas in Communications.

[18]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[19]  Sanjeev Khanna,et al.  A Polynomial Time Approximation Scheme for the Multiple Knapsack Problem , 2005, SIAM J. Comput..

[20]  Vassilios V. Dimakopoulos,et al.  Creating and Maintaining Replicas in Unstructured Peer-to-Peer Systems , 2006, Euro-Par.

[21]  Christos Gkantsidis,et al.  Random walks in peer-to-peer networks: Algorithms and evaluation , 2006, Perform. Evaluation.

[22]  Rajmohan Rajaraman,et al.  Approximation algorithms for data placement in arbitrary networks , 2001, SODA '01.

[23]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[24]  Robert Tappan Morris,et al.  Designing a DHT for Low Latency and High Throughput , 2004, NSDI.

[25]  Amit Kumar,et al.  Maximum Coverage Problem with Group Budget Constraints and Applications , 2004, APPROX-RANDOM.

[26]  Éva Tardos,et al.  An approximation algorithm for the generalized assignment problem , 1993, Math. Program..