On the number of terms used in automatic query expansion

This paper investigates the number of expansion terms to use in automatic query expansion by examining the behavior of eight retrieval systems participating in the NRRC Reliable Information Access Workshop. The results demonstrate that current systems are able to obtain nearly all of the benefit of using a fixed number of expansion terms per topic, but significant additional improvement is possible if systems were able to accurately select the best number of expansion terms on a per topic basis. When optimizing average effectiveness as measured by mean average precision, using a fixed number of terms increases the score a large amount for a small number of topics but has little effect for most topics. The analysis further suggests that when a topic is helped by automatic feedback, the increase is from a set of terms that reinforce each other rather than from the system finding a single excellent term.

[1]  Larry Wasserman,et al.  All of Statistics , 2004 .

[2]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[3]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[4]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[5]  Eitan Farchi,et al.  Automatic query wefinement using lexical affinities with maximal information gain , 2002, SIGIR '02.

[6]  Natasa Milic-Frayling,et al.  Experiments in query optimization the CLARIT system TREC-6 report , 1998 .

[7]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[8]  Nega Alemayehu Analysis of performance variation using query expansion , 2003, J. Assoc. Inf. Sci. Technol..

[9]  Sanda M. Harabagiu,et al.  Advances in Open Domain Question Answering (Text, Speech and Language Technology) , 2006 .

[10]  Donna K. Harman,et al.  Overview of the Reliable Information Access Workshop , 2009, Information Retrieval.

[11]  Michael E. Lesk,et al.  The cornell implementation of the smart system , 1971 .

[12]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[13]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[14]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[15]  Sanda M. Harabagiu,et al.  Advances in Open Domain Question Answering , 2007 .

[16]  Tomek Strzalkowski,et al.  A Data Driven Approach to Interactive QA , 2004, New Directions in Question Answering.

[17]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[18]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[19]  Stephen E. Robertson,et al.  Flexible pseudo-relevance feedback using optimization tables , 2001, SIGIR '01.

[20]  ChengXiang Zhai,et al.  Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[21]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[22]  Justin Zobel,et al.  Questioning Query Expansion: An Examination of Behaviour and Parameters , 2004, ADC.

[23]  David A. Evans,et al.  Design and Evaluation of the CLARIT-TREC-2 System , 1993, TREC.

[24]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[25]  David A. Evans,et al.  Clarit-TREC Experiments , 1995, Inf. Process. Manag..

[26]  Charles L. A. Clarke,et al.  Task-Specific Query Expansion (MultiText Experiments for TREC 2003) , 2003, TREC.