Models for metasearch

Given the ranked lists of documents returned by multiple search engines in response to a given query, the problem ofmetasearchis to combine these lists in a way which optimizes the performance of the combination. This paper makes three contributions to the problem of metasearch: (1) We describe and investigate a metasearch model based on an optimal democratic voting procedure, the Borda Count; (2) we describe and investigate a metasearch model based on Bayesian inference; and (3) we describe and investigate a model for obtaining upper bounds on the performance of metasearch algorithms. Our experimental results show that metasearch algorithms based on the Borda and Bayesian models usually outperform the best input system and are competitive with, and often outperform, existing metasearch strategies. Finally, our initial upper bounds demonstrate that there is much to learn about the limits of the performance of metasearch.

[1]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model , 1990, Inf. Process. Manag..

[2]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 2: Mathematical treatment of CEO model 3 , 1990, Inf. Process. Manag..

[3]  William S. Cooper,et al.  Some inconsistencies and misnomers in probabilistic information retrieval , 1991, SIGIR '91.

[4]  Edward A. Fox,et al.  Combining Evidence from Multiple Searches , 1992, TREC.

[5]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[6]  Fredric C. Gey,et al.  Full Text Retrieval based on Probalistic Equations with Coefficients fitted by Logistic Regression , 1993, TREC.

[7]  Nicholas J. Belkin,et al.  Combining Evidence for Information Retrieval , 1993, TREC.

[8]  Brian T. Bartell,et al.  Optimizing ranking functions: a connectionist approach to adaptive information retrieval , 1994 .

[9]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[10]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[11]  Jacques Savoy,et al.  Report on the TREC-5 Experiment: Data Fusion and Collection Fusion , 1996, TREC.

[12]  Garrison W. Cottrell,et al.  Using Relevance to Train a Linear Mixture of Experts , 1996, TREC.

[13]  Paul B. Kantor,et al.  Data Fusion of Machine-Learning Methods for the TREC5 Routing Task (and other work) , 1996, TREC.

[14]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[15]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[16]  Donald H. Kraft,et al.  Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , 1998, SIGIR 2002.

[17]  Kwong Bor Ng,et al.  An investigation of the conditions for effective data fusion in information retrieval , 1998 .

[18]  Garrison W. Cottrell,et al.  Adaptive combination of evidence for information retrieval , 1999 .

[19]  Ophir Frieder,et al.  SENTINEL: A Multiple Engine Information Retrieval and Visualization System , 1999, J. Am. Soc. Inf. Sci..

[20]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[21]  Ophir Frieder,et al.  SENTINEL: a multiple engine information retrieval and visualization system , 1999 .

[22]  D. Saari Explaining All Three-Alternative Voting Outcomes , 1999 .

[23]  Oren Etzioni,et al.  Towards comprehensive web search , 1999 .

[24]  Oren Etzioni,et al.  On the Instability of Web Search Engines , 2000, RIAO.

[25]  Christopher C. Vogt How much more is better? Characterising the effects of adding more IR Systems to a combination , 2000, RIAO.

[26]  Lambert Schomaker,et al.  Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[27]  R. Manmatha,et al.  Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[28]  Javed A. Aslam,et al.  Metasearch consistency , 2001, SIGIR '01.

[29]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .