Making search efficient on Gnutella-like P2P systems

Leveraging the state-of-the-art information retrieval (IR) algorithms like VSM and relevance ranking algorithm, we present GES, an efficient IR system built on top of Gnutella-like P2P networks. The key idea is that GES employs a distributed, content-based, and capacity-aware topology adaptation algorithm to organize nodes (each of which is represented by a node vector) into semantic groups. The intuition behind this design is that semantically associated nodes within a semantic group tend to be relevant to the same queries. Given a query, GES uses a capacity-aware search protocol based on semantic groups and selective one-hop node vector replication, to direct the query to the most relevant nodes which are responsible for the query, thereby achieving high recall with probing only a small faction of nodes. Moreover, GES adopts automatic query expansion techniques to improve quality of search results, and it is the first work to show that node vector size plays a very important role in system performance. The experimental results show that GES is very efficient, and even outperforms the centralized node clustering system like SETS.

[1]  Yiming Hu,et al.  Integrating semantics-based access mechanisms with P2P file systems , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[2]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[3]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[4]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[5]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[6]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[7]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[8]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[9]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[10]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[11]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[12]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[13]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[14]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[15]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[16]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[17]  Shlomo Moran,et al.  Optimizing result prefetching in web search engines with segmented indices , 2002, TOIT.

[18]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[19]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[20]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[21]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[22]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[23]  Ka Boon Ng,et al.  Peer Clustering and Firework Query Model , 2002 .

[24]  Donna K. Harman,et al.  The Text REtrieval Conference (TREC) , 1999, NTCIR.

[25]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[26]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .