Precision prediction based on ranked list coherence

We introduce a statistical measure of the coherence of a list of documents called the clarity score. Starting with a document list ranked by the query-likelihood retrieval model, we demonstrate the score's relationship to query ambiguity with respect to the collection. We also show that the clarity score is correlated with the average precision of a query and lay the groundwork for useful predictions by discussing a method of setting decision thresholds automatically. We then show that passage-based clarity scores correlate with average-precision measures of ranked lists of passages, where a passage is judged relevant if it contains correct answer text, which extends the basic method to passage-based systems. Next, we introduce variants of document-based clarity scores to improve the robustness, applicability, and predictive ability of clarity scores. In particular, we introduce the ranked list clarity score that can be computed with only a ranked list of documents, and the weighted clarity score where query terms contribute more than other terms. Finally, we show an approach to predicting queries that perform poorly on query expansion that uses techniques expanding on the ideas presented earlier.

[1]  Chris Buckley,et al.  The TREC-9 Query Track , 2000, TREC.

[2]  Chirag Shah,et al.  Evaluating high accuracy retrieval techniques , 2004, SIGIR '04.

[3]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[4]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[5]  W. Bruce Croft,et al.  Relevance Models in Information Retrieval , 2003 .

[6]  W. Bruce Croft,et al.  Quantifying query ambiguity , 2002 .

[7]  Terry Sullivan Locating question difficulty through explorations in question space , 2001, JCDL '01.

[8]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[9]  M. C. Jones,et al.  3. Nonparametric Statistical Inference , 1993 .

[10]  W. Bruce Croft,et al.  INQUERY System Overview , 1993, TIPSTER.

[11]  W. Bruce Croft,et al.  Relevance Feedback and Personalization: A Language Modeling Perspective , 2001, DELOS.

[12]  Fernando Diaz,et al.  Using temporal profiles of queries for precision prediction , 2004, SIGIR '04.

[13]  R. K. Tuteja,et al.  Characterization of a quantitative-qualitative measure of relative information , 1984, Inf. Sci..

[14]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[15]  James Allan,et al.  Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.

[16]  Marcel Worring,et al.  NIST Special Publication , 2005 .

[17]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[18]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[19]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[20]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[21]  W. Bruce Croft,et al.  Answer models for question answering passage retrieval , 2004, SIGIR '04.

[22]  Christoph Arndt,et al.  Information Measures: Information and its Description in Science and Engineering , 2001 .

[23]  Ellen M. Voorhees,et al.  Overview of the TREC-9 Question Answering Track , 2000, TREC.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Steve Renals,et al.  Proceedings of the Ninth Text REtrieval Conference , 2001 .

[26]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Robust Retrieval Track , 2004 .

[27]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[28]  James Allan,et al.  Relevance models for topic detection and tracking , 2002 .

[29]  W. Bruce Croft,et al.  Predicting Question Effectiveness , 2022 .

[30]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[31]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[32]  Claudio Carpineto,et al.  Fondazione Ugo Bordoni at TREC 2003: Robust and Web Track , 2003, TREC.

[33]  Mark Rorvig,et al.  A New Method of Measurement for Question Difficulty. , 2000 .

[34]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[35]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[36]  Kalervo Järvelin,et al.  Employing the resolution power of search keys , 2001 .

[37]  Andrew Turpin,et al.  Do Clarity Scores for Queries Correlate with User Performance? , 2004, ADC.