Bookmark-driven Query Routing in Peer-to-Peer Web Search

We consider the problem of collaborative Web search and query routing strategies in a peer-to-peer (P2P) environment. In our architecture every peer has a full-fledged search engine with a (thematically focused) crawler and a local index whose contents may be tailored to the user’s specific interest profile. Peers are autonomous and post meta-information about their bookmarks and index lists to a global directory, which is efficiently implemented in a decentralized manner using Chordstyle distributed hash tables. A query posed by one peer is first evaluated locally; if the result is unsatisfactory the query is forwarded to selected peers. These peers are chosen based on a benefit/cost measure where benefit reflects the thematic similarity of peers’ interest profiles, derived from bookmarks, and cost captures estimated peer load and response time. The meta-information that is needed for making these query routing decisions is efficiently looked up in the global directory; it can also be cached and proactively disseminated for higher availability and reduced network load.

[1]  Norbert Fuhr,et al.  A decision-theoretic approach to database selection in networked IR , 1999, TOIS.

[2]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[3]  Surithong Srisa‐ard,et al.  Mining the Web: Discovering Knowledge from Hypertext Data , 2003 .

[4]  Norbert Fuhr,et al.  From Uncertain Inference to Probability of Relevance for Advanced IR Applications , 2003, ECIR.

[5]  James P. Callan,et al.  Effective retrieval with distributed collections , 1998, SIGIR '98.

[6]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[7]  Hector Garcia-Molina,et al.  Improving search in peer-to-peer networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[8]  Tao Tao,et al.  A Mixture Clustering Model for Pseudo Feedback in Information Retrieval , 2004 .

[9]  James P. Callan,et al.  Collection selection and results merging with topically organized U.S. patents and TREC data , 2000, CIKM '00.

[10]  Justin Zobel,et al.  Collection Selection via Lexicon Inspection , 1997 .

[11]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[12]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[13]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[14]  Mudhakar Srivatsa,et al.  Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web , 2003, Distributed Multimedia Information Retrieval.

[15]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[16]  Rüdiger Schollmeier,et al.  A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[17]  Jian Xu,et al.  Database selection techniques for routing bibliographic queries , 1998, DL '98.

[18]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[19]  Jie Lu,et al.  Content-based retrieval in hybrid peer-to-peer networks , 2003, CIKM '03.

[20]  Gerhard Weikum,et al.  The MINERVA Project: Database Selection in the Context of P2P Search , 2005, BTW.

[21]  Qiong Luo,et al.  A Meta-search Method with Clustering and Term Correlation , 2004, DASFAA.

[22]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[23]  Gerhard Weikum,et al.  The BINGO! System for Information Portal Generation and Expert Web Search , 2003, CIDR.

[24]  James C. French,et al.  Comparing the performance of collection selection algorithms , 2003, TOIS.

[25]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[26]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[27]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[28]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[29]  Martin Heß,et al.  QUEST - Querying Specialized Collections on the Web , 2000, ECDL.

[30]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[31]  Hans-Jörg Schek,et al.  PowerDB-IR: information retrieval on top of a database cluster , 2001, CIKM '01.

[32]  Jacques Savoy,et al.  Approaches to collection selection and results merging for distributed information retrieval , 2001, CIKM '01.

[33]  Gustavo Alonso,et al.  Web Services: Concepts, Architectures and Applications , 2009 .

[34]  Patrick Valduriez,et al.  Distributed and parallel database systems , 1996, CSUR.

[35]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[36]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[37]  Norbert Fuhr,et al.  Evaluating different methods of estimating retrieval quality for resource selection , 2003, SIGIR.

[38]  Hector Garcia-Molina,et al.  Semantic Overlay Networks for P2P Systems , 2004, AP2PC.

[39]  V. Berridge Health and medicine , 1990 .

[40]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[41]  Sriram Raghavan,et al.  Building a distributed full-text index for the Web , 2001, WWW '01.

[42]  James C. French,et al.  Metrics for evaluating database selection techniques , 2004, World Wide Web.

[43]  Michel Beigbeder,et al.  A methodology for collection selection in heterogeneous contexts , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[44]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[45]  Edith Cohen,et al.  Associative search in peer to peer networks: harnessing latent semantics , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[46]  Yi Zhang,et al.  Exact Maximum Likelihood Estimation for Word Mixtures , 2002 .

[47]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[48]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[49]  Dik Lun Lee,et al.  A meta-search method reinforced by cluster descriptors , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[50]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[51]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[52]  Sandhya Dwarkadas,et al.  Peer-to-peer information retrieval using self-organizing semantic overlay networks , 2003, SIGCOMM '03.

[53]  Gerhard Weikum,et al.  Snowball: Scalable Storage on Networks of Workstations with Balanced Load , 1998, Distributed and Parallel Databases.

[54]  Felix Naumann,et al.  Semantic Overlay Clusters within Super-Peer Networks , 2003, DBISP2P.

[55]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[56]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[57]  Sumio Fujita,et al.  More Reflections on "Aboutness" TREC-2001 Evaluation Experiments at Justsystem , 2001, TREC.

[58]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[59]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[60]  Peter Bailey,et al.  Server selection on the World Wide Web , 2000, DL '00.

[61]  Clement T. Yu,et al.  Towards a highly-scalable and effective metasearch engine , 2001, WWW '01.

[62]  Torsten Suel,et al.  ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval , 2003, WebDB.

[63]  Dik Lun Lee,et al.  An MDP-based peer-to-peer search server network , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[64]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[65]  James P. Callan,et al.  The effectiveness of query expansion for distributed information retrieval , 2001, CIKM '01.

[66]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[67]  Djoerd Hiemstra,et al.  Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002 , 2003, SIGF.

[68]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[69]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[70]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[71]  Zhenyu Liu,et al.  A probabilistic approach to metasearching with adaptive probing , 2004, Proceedings. 20th International Conference on Data Engineering.

[72]  David J. Harper,et al.  Topic modeling for mediated access to very large document collections , 2004, J. Assoc. Inf. Sci. Technol..

[73]  AbererKarl,et al.  Improving Data Access in P2P Systems , 2002 .

[74]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[75]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[76]  Martin Bergman,et al.  The deep web:surfacing the hidden value , 2000 .

[77]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[78]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[79]  Gerhard Weikum,et al.  The MINERVAMinerva is the Roman goddess of science, wisdom, and learning. Project: Towards Collaborative Search in Digital Libraries Using Peer-to-Peer Technology , 2004, DELOS.

[80]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[81]  Karl Aberer,et al.  Improving Data Access in P2P Systems , 2002, IEEE Internet Comput..