A Language Modeling Framework for Selective Query Expansion

Abstract : Query expansion is a well-known technique that has been shown to improve average retrieval performance. This technique has not been used in many operational systems because of the fact that it can greatly degrade the performance of some individual queries. We show how comparison between language models of the unexpanded and expanded retrieval results can be used to predict when the expanded retrieval has strayed from the original sense of the query. In these cases, the unexpanded results are used while the expanded results are used in the remaining cases (where such straying is not detected). We evaluate this method and others on a wide variety of TREC collections and show how to automatically compute a decision threshold for a collection. We demonstrate the ability of the method to enhance the effectiveness and reliability of the query expansion technique in information retrieval.

[1]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[4]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[5]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[6]  R. K. Tuteja,et al.  Characterization of a quantitative-qualitative measure of relative information , 1984, Inf. Sci..

[7]  Christoph Arndt,et al.  Information Measures: Information and its Description in Science and Engineering , 2001 .

[8]  Richard D. Deveaux,et al.  Applied Smoothing Techniques for Data Analysis , 1999, Technometrics.

[9]  W. Bruce Croft,et al.  Quantifying query ambiguity , 2002 .

[10]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[11]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[12]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[13]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[14]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[15]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[17]  W. Bruce Croft,et al.  INQUERY System Overview , 1993, TIPSTER.

[18]  Chris Buckley,et al.  The TREC-9 Query Track , 2000, TREC.