ProbFuse: a probabilistic approach to data fusion

Data fusion is the combination of the results of independent searches on a document collection into one single output result set. It has been shown in the past that this can greatly improve retrieval effectiveness over that of the individual results.This paper presents probFuse, a probabilistic approach to data fusion. ProbFuse assumes that the performance of the individual input systems on a number of training queries is indicative of their future performance. The fused result set is based on probabilities of relevance calculated during this training process. Retrieval experiments using data from the TREC ad hoc collection demonstrate that probFuse achieves results superior to that of the popular CombMNZ fusion algorithm.

[1]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[2]  James P. Callan,et al.  Collection selection and results merging with topically organized U.S. patents and TREC data , 2000, CIKM '00.

[3]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[4]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[5]  James C. French,et al.  The impact of database selection on distributed searching , 2000, SIGIR '00.

[6]  Javed A. Aslam,et al.  Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session) , 2000, SIGIR '00.

[7]  Adele E. Howe,et al.  SAVVYSEARCH: A Metasearch Engine That Learns Which Search Engines to Query , 1997, AI Mag..

[8]  R. Manmatha,et al.  Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[9]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[10]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[11]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[12]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[13]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[14]  C. Lee Giles,et al.  Inquirus, the NECI Meta Search Engine , 1998, Comput. Networks.

[15]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[16]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[17]  Luis Gravano,et al.  STARTS: Stanford Protocol Proposal for Internet Retrieval and Search , 1997 .

[18]  Ophir Frieder,et al.  Fusion of effective retrieval strategies in the same information retrieval system , 2004, J. Assoc. Inf. Sci. Technol..

[19]  Luo Si,et al.  Using sampled data and regression to merge search engine results , 2002, SIGIR '02.

[20]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[21]  Donna K. Harman,et al.  Overview of the first TREC conference , 1993, SIGIR.

[22]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[23]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[24]  David Hawking,et al.  Merging Results From Isolated Search Engines , 1999, Australasian Database Conference.

[25]  Liu Peng,et al.  Probability-based fusion of information retrieval result sets , 2006, Artificial Intelligence Review.

[26]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[27]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.