Difficulty-aware Hybrid Search in Peer-to-Peer Networks

By combining an unstructured protocol with a DHT-based global index, hybrid peer-to-peer (P2P) improves search efficiency in terms of query recall and response time. The key challenge in hybrid search is to estimate the number of peers that can answer a given query. Existing approaches assume that such a number can be directly obtained by computing item popularity. In this work, we show that such an assumption is not always valid, and previous designs cannot distinguish whether items related to a query are distributed in many peers or are in a few peers. To address this issue, we propose QRank, a difficulty-aware hybrid search, which ranks queries by weighting keywords based on term frequency. Using rank values, QRank selects proper search strategies for queries. We conduct comprehensive trace-driven simulations to evaluate this design. Results show that QRank significantly improves the search quality as well as reducing system traffic cost compared with existing approaches.

[1]  Bruce M. Maggs,et al.  Efficient content location using interest-based locality in peer-to-peer systems , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[2]  Wednesday September,et al.  2007 International Conference on Parallel Processing , 2007 .

[3]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[4]  Walter Willinger,et al.  Network topology generators: degree-based vs. structural , 2002, SIGCOMM 2002.

[5]  Bruce M. Maggs,et al.  Globally Distributed Content Delivery , 2002, IEEE Internet Comput..

[6]  Scott Shenker,et al.  Making gnutella-like P2P systems scalable , 2003, SIGCOMM '03.

[7]  Yunhao Liu,et al.  Efficient multi-keyword search over p2p web , 2008, WWW.

[8]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[9]  Ion Stoica,et al.  The Case for a Hybrid P2P Search Infrastructure , 2004, IPTPS.

[10]  Li Xiao,et al.  Low-Cost and Reliable Mutual Anonymity Protocols in Peer-to-Peer Networks , 2003, IEEE Trans. Parallel Distributed Syst..

[11]  Peter Bailey,et al.  Measuring Search Engine Quality , 2001, Information Retrieval.

[12]  Jie Wu,et al.  FISSIONE: a scalable constant degree and low congestion DHT scheme based on Kautz graphs , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[13]  Dimitrios Gunopulos,et al.  Exploiting locality for scalable information retrieval in peer-to-peer networks , 2005, Inf. Syst..

[14]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[15]  Donald F. Towsley,et al.  On distinguishing between Internet power law topology generators , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[16]  Li Xiao,et al.  Location-aware topology matching in P2P systems , 2004, IEEE INFOCOM 2004.

[17]  Kui-Lam Kwok,et al.  A new method of weighting query terms for ad-hoc retrieval , 1996, SIGIR '96.

[18]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[19]  Srinivasan Keshav,et al.  Gossip‐based search selection in hybrid peer‐to‐peer networks , 2008, IPTPS.

[20]  Yunhao Liu,et al.  Difficulty-Aware Hybrid Search in Peer-to-Peer Networks , 2009, IEEE Trans. Parallel Distributed Syst..

[21]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[22]  Ian Witten,et al.  Data Mining , 2000 .

[23]  Jacky C. Chu,et al.  Availability and locality measurements of peer-to-peer file systems , 2002, SPIE ITCom.

[24]  Tsungnan Lin,et al.  On efficiency in searching networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[25]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[26]  Yunhao Liu,et al.  AnySee: Peer-to-Peer Live Streaming , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[27]  Walter Willinger,et al.  Network topology generators: degree-based vs. structural , 2002, SIGCOMM '02.

[28]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[29]  Srinivasan Seshan,et al.  Synopsis diffusion for robust aggregation in sensor networks , 2004, SenSys '04.

[30]  Li Xiao,et al.  Building a large and efficient hybrid peer-to-peer Internet caching system , 2004, IEEE Transactions on Knowledge and Data Engineering.

[31]  Hai Jin,et al.  SemreX: Efficient search in a semantic overlay for literature retrieval , 2008, Future Gener. Comput. Syst..

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[34]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[35]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[36]  R. Srikant,et al.  Modeling and performance analysis of BitTorrent-like peer-to-peer networks , 2004, SIGCOMM '04.