Failure recovery for structured p2p networks: Protocol design and performance under churn

Measurement studies indicate a high rate of node dynamics in p2p systems. In this paper, we address the question of how high a rate of node dynamics can be supported by structured p2p networks. We confine our study to the hypercube routing scheme used by several structured p2p systems. To improve system robustness and facilitate failure recovery, we introduce the property of K-consistency, K ≥ 1, which generalizes consistency defined previously. (Consistency guarantees connectivity from any node to any other node.) We design and evaluate a failure recovery protocol based upon local information for K-consistent networks. The failure recovery protocol is then integrated with a join protocol that has been proved to construct K-consistent neighbor tables for concurrent joins. The integrated protocols were evaluated by a set of simulation experiments in which nodes joined a 2000-node network and nodes (both old and new) were randomly selected to fail concurrently over 10,000 s of simulated time. In each such "churn" experiment, we took a "snapshot" of neighbor tables in the network once every 50 s and evaluated connectivity and consistency measures over time as a function of the churn rate, timeout value in failure recovery, and K. We found our protocols to be effective, efficient, and stable for an average node lifetime as low as 8.3 min. Experiment results also show that the average routing delay of our protocols increases only slightly even when the churn rate is greatly increased.

[1]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[2]  A. Udaya Shankar,et al.  A Theory of Interfaces and Modules I-Composition Theorem , 1994, IEEE Trans. Software Eng..

[3]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[4]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[5]  Rajmohan Rajaraman,et al.  Accessing Nearby Copies of Replicated Objects in a Distributed Environment , 1999, Theory of Computing Systems.

[6]  Simon S. Lam,et al.  Silk: A Resilient Routing Fabric for Peer-to-Peer Networks , 2003 .

[7]  Krishna P. Gummadi,et al.  A measurement study of Napster and Gnutella as examples of peer-to-peer file sharing systems , 2002, CCRV.

[8]  Ben Y. Zhao,et al.  Distributed Object Location in a Dynamic Network , 2004, Theory of Computing Systems.

[9]  Simon S. Lam,et al.  Failure recovery for structured P2P networks: protocol design and performance evaluation , 2004, SIGMETRICS '04/Performance '04.

[10]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[11]  Simon S. Lam,et al.  Neighbor table construction and update in a dynamic peer-to-peer network , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[12]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[13]  Ellen W. Zegura,et al.  How to model an internetwork , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[14]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[15]  Robert Tappan Morris,et al.  Comparing the Performance of Distributed Hash Tables Under Churn , 2004, IPTPS.