Fusion Via a Linear Combination of Scores

We present a thorough analysis of the capabilities of the linear combination (LC) model for fusion of information retrieval systems. The LC model combines the results lists of multiple IR systems by scoring each document using a weighted sum of the scores from each of the component systems. We first present both empirical and analytical justification for the hypotheses that such a model should only be used when the systems involved have high performance, a large overlap of relevant documents, and a small overlap of nonrelevant documents. The empirical approach allows us to very accurately predict the performance of a combined system. We also derive a formula for a theoretically optimal weighting scheme for combining 2 systems. We introduce d—the difference between the average score on relevant documents and the average score on nonrelevant documents—as a performance measure which not only allows mathematical reasoning about system performance, but also allows the selection of weights which generalize well to new documents. We describe a number of experiments involving large numbers of different IR systems which support these findings.

[1]  L. Guttman What is Not What in Statistics , 1977 .

[2]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[3]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[4]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[7]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[8]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[9]  John A. Swets,et al.  Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers , 1996 .

[10]  Peter Schäuble,et al.  Improving a Basic Retrieval Method by Links and Passage Level Evidence , 1994, TREC.

[11]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[12]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[13]  Louis Guttman,et al.  What Is Not What in Statistics , 1977 .

[14]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[15]  Garrison W. Cottrell,et al.  Using Relevance to Train a Linear Mixture of Experts , 1996, TREC.

[16]  Yiyu Yao,et al.  Computation of term associations by a neural network , 1993, SIGIR.

[17]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[18]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[19]  Paul B. Kantor,et al.  Decision Level Data Fusion for Routing of Documents in the TREC3 Context: A Base Case Analysis of Worst Case Results , 1994, TREC.

[20]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[21]  Oren Etzioni,et al.  Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[22]  Garrison W. Cottrell,et al.  Using d 0 to Optimize Rankings , 1998 .

[23]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[24]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[25]  F. Crestani Comparing neural and probabilistic relevance feedback in an interactive information retrieval system , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[26]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[27]  Garrison W. Cottrell,et al.  User lenses—achieving 100% precision on frequently asked questions , 1999 .

[28]  Mohand Boughanem,et al.  A neural network model for documentary base self-organising and querying , 1993, Proceedings of ICCI'93: 5th International Conference on Computing and Information.

[29]  Robert R. Korfhage,et al.  SIGIR '93 : proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval : Pittsburgh, PA USA , 1993 .

[30]  K. Richard,et al.  Automatic Combination of Multiple RankedRetrieval , 1994 .

[31]  Kwong Bor Ng,et al.  An investigation of the conditions for effective data fusion in information retrieval , 1998 .