论文信息 - Predicting the performance of linearly combined IR systems

Predicting the performance of linearly combined IR systems

We introduce a new technique for analyzing combination models. The technique allows us to make qualitative conclusions about which IR systems should be combined. We achieve this by using a linear regression to accurately (T ’ = 0.98) predict the performance of the combined system based on quantitative measurements of individual component systems taken from TREC5. When applied to a linear model (weighted sum of relevance scores), the technique supports several previously suggested hypotheses: one should maximize both the individual systems’ performances and the overlap of relevant documents between systems, while minimizing the overlap of nonrelevant documents. It also suggests new conclusions: both systems should distribute scores similarly, but not rank relevant documents similarly. It furthermore suggests that the linear model is only able to exploit a fraction of the benefit possible from combination. The technique is general in nature and capable of pointing out the strengths and weaknesses of any given combination approach.

Garrison W. Cottrell | Christopher C. Vogt | G. Cottrell | C. Vogt

[1] Paul B. Kantor,et al. Decision Level Data Fusion for Routing of Documents in the TREC3 Context: A Base Case Analysis of Worst Case Results , 1994, TREC.

[2] Oren Etzioni,et al. Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[3] William H. Press,et al. The Art of Scientific Computing Second Edition , 1998 .

[4] E. A. Fox,et al. Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[5] Edward A. Fox,et al. Combination of Multiple Searches , 1993, TREC.

[6] Garrison W. Cottrell,et al. Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[7] Donna K. Harman,et al. Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[8] Jong-Hak Lee,et al. Analyses of multiple evidence combination , 1997, SIGIR '97.

[9] L. Guttman. What is Not What in Statistics , 1977 .

[10] Donna K. Harman,et al. Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[11] Peter Schäuble,et al. Improving a Basic Retrieval Method by Links and Passage Level Evidence , 1994, TREC.

[12] Brian T. Bartell,et al. Optimizing ranking functions: a connectionist approach to adaptive information retrieval , 1994 .

[13] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[14] Paul B. Kantor,et al. A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap* , 1988 .

[15] Oren Etzioni,et al. Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[16] Garrison W. Cottrell,et al. Using Relevance to Train a Linear Mixture of Experts , 1996, TREC.

[17] Ellen M. Voorhees,et al. The fifth text REtrieval conference (TREC-5) , 1997 .