Enhancing Search Performance on Gnutella-Like P2P Systems

The big challenges facing the search techniques on Gnutella-like peer-to-peer networks are search efficiency and quality of search results. In this paper, leveraging information retrieval (IR) algorithms such as Vector Space Model (VSM) and relevance ranking algorithms, we present GES (Gnutella with Efficient Search) to improve search performance. The key idea is that GES uses a distributed topology adaptation algorithm to organize semantically relevant nodes into same semantic groups by using the notion of node vector. Given a query, GES employs an efficient search protocol to direct the query to the most relevant semantic groups for answers, thereby achieving high recall with probing only a small fraction of nodes. To the best of our knowledge, GES is the first to identify node vector size as an important role in impacting search performance and to show that the node vector size offers a good trade-off between search performance and bandwidth cost. Moreover, GES adopts automatic query expansion and local data clustering to improve search performance. We show that GES is efficient and even outperforms the centralized node clustering system SETS. For example, in the scenario where node capacity is heterogeneous, GES can achieve 73 percent recall when probing only 20 percent nodes, outperforming SETS by about 18 percent.

[1]  Shlomo Moran,et al.  Optimizing result prefetching in web search engines with segmented indices , 2002, TOIT.

[2]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[3]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[4]  Edith Cohen,et al.  Search and replication in unstructured peer-to-peer networks , 2002, ICS '02.

[5]  Ben Y. Zhao,et al.  Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and , 2001 .

[6]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[7]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[8]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[9]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[10]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[11]  Gurmeet Singh Manku,et al.  SETS: search enhanced by topic segmentation , 2003, SIGIR.

[12]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[13]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[14]  Anjali Gupta,et al.  Efficient Routing for Peer-to-Peer Overlays , 2004, NSDI.

[15]  Peter Triantafillou,et al.  AESOP: Altruism-Endowed Self-organizing Peers , 2004, DBISP2P.

[16]  Chi-Hang Chan,et al.  Advanced Peer Clustering and Firework Query Model in the Peer-to-Peer Network , 2003, WWW.

[17]  Gerhard Weikum,et al.  Improving collection selection with overlap awareness in P2P search engines , 2005, SIGIR '05.

[18]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[19]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[20]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[21]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[22]  Ka Boon Ng,et al.  Peer Clustering and Firework Query Model , 2002 .

[23]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[24]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[25]  Subbarao Kambhampati,et al.  Improving text collection selection with coverage and overlap statistics , 2005, WWW '05.

[26]  Yiming Hu,et al.  Integrating semantics-based access mechanisms with P2P file systems , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[27]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[28]  Krishna P. Gummadi,et al.  Measuring and analyzing the characteristics of Napster and Gnutella hosts , 2003, Multimedia Systems.

[29]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[30]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[31]  Donna K. Harman,et al.  The Text REtrieval Conference (TREC) , 1999, NTCIR.

[32]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[33]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[34]  Peter Triantafillou,et al.  Towards High Performance Peer-to-Peer Content and Resource Sharing Systems , 2003, CIDR.

[35]  Sandhya Dwarkadas,et al.  On scaling latent semantic indexing for large peer-to-peer systems , 2004, SIGIR '04.