Distributed Ranked Search

P2P deployments are a natural infrastructure for building distributed search networks. Proposed systems support locating and retrieving all results, but lack the information necessary to rank them. Users, however, are primarily interested in the most relevant results, not necessarily all possible results. Using random sampling, we extend a class of well-known information retrieval ranking algorithms such that they can be applied in this decentralized setting. We analyze the overhead of our approach, and quantify how our system scales with increasing number of documents, system size, document to node mapping (uniform versus non-uniform), and types of queries (rare versus popular terms). Our analysis and simulations show that a) these extensions are efficient, and scale with little overhead to large systems, and b) the accuracy of the results obtained using distributed ranking is comparable to that of a centralized implementation.

[1]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[2]  Aravind Srinivasan,et al.  Efficient lookup on unstructured topologies , 2005, IEEE Journal on Selected Areas in Communications.

[3]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[6]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[7]  Jared Saia,et al.  Choosing a random peer , 2004, PODC '04.

[8]  Aravind Srinivasan,et al.  Ranking Search Results in Peer-to-Peer Systems , 2006 .

[9]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[10]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[11]  Zhe Wang,et al.  Efficient top-K query calculation in distributed networks , 2004, PODC '04.

[12]  David J. DeWitt,et al.  Computing PageRank in a Distributed Internet Search Engine System , 2004, VLDB.

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  Indrajit Bhattacharya,et al.  Similarity Searching in Peer-to-Peer Databases , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[15]  Sandhya Dwarkadas,et al.  Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval , 2004, NSDI.

[16]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[17]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[18]  Hector Garcia-Molina,et al.  YAPPERS: a peer-to-peer lookup service over arbitrary topology , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[19]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[20]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[21]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[22]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[23]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[24]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[25]  Vijay Gopalakrishnan,et al.  Efficient Peer-to-Peer Namespace Searches , 2004 .