Probabilistic file indexing and searching in unstructured peer-to-peer networks

Thanks to the advance of network and computing technology, Peer-to-Peer (P2P) has become a popular way for file sharing. A huge amount of files can now be directly accessed and downloaded by a simple mouse click. Among the types of P2P networks, unstructured architecture has been proven quite successful, mainly due to its simplicity and robustness. However, searching for distant and rare files is still a challenging problem in unstructured P2P networks. Existing approaches either have poor response time, or generate too much network traffic. In this paper we propose a simple, practical, yet powerful index scheme to enhance search in unstructured P2P networks. The index scheme uses a data structure ''Bloom filters'' to index files shared at each node, and then lets nodes gossip to one another to exchange their Bloom filters. In effect, each node indexes a random set of files in the network, thereby allowing every query to have a constant probability to be successfully resolved within a fixed search space. The experimental results show that our approach can improve the search in Gnutella by an order of magnitude. For example, in a typical Gnutella network consisting of about 89,000 nodes, by replicating a node's Bloom filter to less than 0.45% of the nodes in the network, 70% of the queries can be resolved within a search space of 200 nodes. In contrast, within the same search space size, only 1.6% of the queries can be resolved without the index scheme; or, alternatively, more than 48,000 nodes need to be searched in Gnutella in order to reach the same success rate as our index scheme.

[1]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[2]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[3]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002 .

[4]  Vana Kalogeraki,et al.  Finding good peers in peer-to-peer networks , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[5]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[6]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[7]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[8]  Matei Ripeanu,et al.  Peer-to-peer architecture case study: Gnutella network , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[9]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[10]  Eytan Adar,et al.  Free Riding on Gnutella , 2000, First Monday.

[11]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[12]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[13]  George Varghese,et al.  Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications , 2001, SIGCOMM 2001.

[14]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[15]  John Kubiatowicz,et al.  Probabilistic location and routing , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[16]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[17]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[18]  Miguel Castro,et al.  Controlling the Cost of Reliability in Peer-to-Peer Overlays , 2003, IPTPS.

[19]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[21]  Anukool Lakhina,et al.  BRITE: Universal Topology Generation from a User''s Perspective , 2001 .

[22]  Yiwei Thomas Hou,et al.  Guest Editorial Recent Advances in Service Overlay Networks , 2004 .

[23]  Dimitrios Gunopulos,et al.  A local search mechanism for peer-to-peer networks , 2002, CIKM '02.