Epidemic Sampling for Search in Unstructured Peer-to-Peer Networks

Unstructured peer-to-peer (P2P) networks are self-organizing and dynamic. Therefore, they areunindexable and without indexing, efficient search is only possible by efficient and intelligent dissemination of the query to scan the network nodes/objects. Existing search mechanisms are rare, inefficient, and naive without theoretical foundation. In this paper, first we define a query model that formalizes and generalizes the typical P2P exact-match search queries topartial selection queries, i.e., selection queries that can be satisfied by a partial result-set (rather than the entire result-set). As compared to exact-match queries, partial selection queries are not only more general, but also more cost-efficient and practical. Even though the existing search mechanisms can be applied to answer such queries, none of them are designed to answer these queries intelligently and efficiently. Subsequently, we introduce our simple but elegant epidemic-based search mechanism, termed the SIR samplingmechanism, which is specially designed to answer partial selection queries efficiently. Based on a rigorous percolation analysis, we can tune our SIR sampling mechanism on-the-fly and per-query to take a just sufficiently large sample of the network nodes/objects to satisfy the partial selection query. Our empirical study shows that SIR sampling can strike a balance between communication cost and response time of the query. For the common case of the P2P partial selection queries, SIR sampling outperforms flooding by up to two orders of magnitude in communication cost while maintaining a tolerable responsetime. Also, SIR sampling outperforms a 32-random-walker in both response time and communication cost.

[1]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[2]  Doug Terry,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[3]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[4]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[5]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[7]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[8]  S. Redner,et al.  Introduction To Percolation Theory , 2018 .

[9]  Mihajlo A. Jovanović,et al.  Modeling Large-scale Peer-to-Peer Networks and a Case Study of Gnutella , 2001 .

[10]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[11]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[12]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[13]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[14]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[16]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[17]  Matei Ripeanu,et al.  Peer-to-peer architecture case study: Gnutella network , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[18]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[20]  Cohen,et al.  Resilience of the internet to random breakdowns , 2000, Physical review letters.

[21]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[22]  Helen J. Wang,et al.  Resilient peer-to-peer streaming , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[23]  Scott Shenker,et al.  Can Heterogeneity Make Gnutella Scalable? , 2002, IPTPS.

[24]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.