Predicting query performance

We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. We suggest that clarity scores measure the ambiguity of a query with respect to a collection of documents and show that they correlate positively with average precision in a variety of TREC test sets. Thus, the clarity score may be used to identify ineffective queries, on average, without relevance information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good are achieved in sampling experiments that randomly assign queries to the two classes.

[1]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[2]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[3]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[4]  Chris Buckley,et al.  The TREC-9 Query Track , 2000, TREC.

[5]  Yiyu Yao,et al.  An Information-Theoretic Measure of Term Specificity , 1992, J. Am. Soc. Inf. Sci..

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  W. Bruce Croft,et al.  A general language model for information retrieval (poster abstract) , 1999, SIGIR '99.

[8]  W. Bruce Croft,et al.  Quantifying query ambiguity , 2002 .

[9]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Richard D. Deveaux,et al.  Applied Smoothing Techniques for Data Analysis , 1999, Technometrics.

[12]  Kalervo Järvelin,et al.  Employing the resolution power of search keys , 2001, J. Assoc. Inf. Sci. Technol..

[13]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[14]  Mark Rorvig,et al.  A New Method of Measurement for Question Difficulty. , 2000 .

[15]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[16]  M. C. Jones,et al.  3. Nonparametric Statistical Inference , 1993 .

[17]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[18]  D. C. Rapaport,et al.  Book review:Monte Carlo methods. Volume I: Basics , 1987 .

[19]  Ronald H. Rnndles Nonparametric Statistical Inference (2nd ed.) , 1986 .

[20]  Terry Sullivan Locating question difficulty through explorations in question space , 2001, JCDL '01.

[21]  Kui-Lam Kwok,et al.  A new method of weighting query terms for ad-hoc retrieval , 1996, SIGIR '96.