Exploiting the Properties of Query Workload and File Name Distributions to Improve P2P Synopsis-Based Searches

Modern P2P systems use hybrid searches to improve search efficiency. They use a synopsis of neighborhood content to determine whether to use a structured or unstructured overlay to satisfy a particular query. Because of their size restrictions, a synopsis cannot hold all the terms from every file in the neighborhood. The challenge is to choose the terms that should be represented in the synopsis. In this work, we investigated the distribution of query terms and file terms in Gnutella networks. We observed that there was a mismatch between terms that were popular among file names and the terms that were popular among the queries generated by the user. Because the query behavior changed with time, a synopsis based on only static set of popular file terms was ill-suited to support efficient searches. We used these observations to design a synopsis creation algorithm that dynamically adapted to the query workload and selected terms for the synopsis to reflect popular terms in both the query workload and file distribution. Our preliminary experimental analysis showed that our Query-Adaptive synopsis improved the search performance over the traditional file-based synopsis model.

[1]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[2]  Yunhao Liu,et al.  Difficulty-aware Hybrid Search in Peer-to-Peer Networks , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[3]  Daniel Stutzbach,et al.  Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems , 2005, IEEE/ACM Transactions on Networking.

[4]  Surendar Chandra,et al.  On the need for query-centric unstructured peer-to-peer overlays , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[5]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[6]  Jun Wang,et al.  ASAP: An Advertisement-based Search Algorithm for Unstructured Peer-to-peer Systems , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[7]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[8]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[9]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10]  Surendar Chandra,et al.  Trace Driven Analysis of the Long Term Evolution of Gnutella Peer-to-Peer Traffic , 2007, PAM.

[11]  Srinivasan Keshav,et al.  Gossip‐based search selection in hybrid peer‐to‐peer networks , 2008, IPTPS.

[12]  Daniel Stutzbach,et al.  On the Long-term Evolution of the Two-Tier Gnutella Overlay , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[13]  Surendar Chandra,et al.  Understanding the practical limits of the Gnutella P2P system: an analysis of query terms and object name distributions , 2008, Electronic Imaging.

[14]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.