Predicting Query Performance Directly from Score Distributions

The task of predicting query performance has received much attention over the past decade. However, many of the frameworks and approaches to predicting query performance are more heuristic than not. In this paper, we develop a principled framework based on modelling the document score distribution to predict query performance directly. In particular, we (1) show how a standard performance measure (e.g. average precision) can be inferred from a document score distribution. We (2) develop techniques for query performance prediction (QPP) by automatically estimating the parameters of the document score distribution (i.e. mixture model) when relevance information is unknown. Therefore, the QPP approaches developed herein aim to estimate average precision directly. Finally, we (3) provide a detailed analysis of one of the QPP approaches that shows that only two parameters of the five-parameter mixture distribution are of practical importance.

[1]  Joemon M. Jose,et al.  Improved query performance prediction using standard deviation , 2011, SIGIR.

[2]  Alan F. Smeaton,et al.  Properties of optimally weighted data fusion in CBMIR , 2010, SIGIR.

[3]  J A Swets,et al.  Information Retrieval Systems. , 1963, Science.

[4]  W. Bruce Croft,et al.  Ranking robustness: a novel framework to predict query performance , 2006, CIKM '06.

[5]  ChengXiang Zhai,et al.  An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[6]  Hans-Jörg Schek,et al.  CIKM'05 : Proceedings of the 14th ACM International Conference on Information and Knowledge Management, October 31-November 5, 2005, Bremen, Germany , 2005 .

[7]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[8]  Evangelos Kanoulas,et al.  Score distribution models: assumptions, intuition, and robustness to score manipulation , 2010, SIGIR.

[9]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[10]  Fernando Diaz,et al.  Performance prediction using spatial autocorrelation , 2007, SIGIR.

[11]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.

[12]  Ronan Cummins,et al.  Learning in a pairwise term-term proximity framework for information retrieval , 2009, SIGIR.

[13]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[14]  Emine Yilmaz,et al.  A geometric interpretation and analysis of R-precision , 2005, CIKM '05.

[15]  Milad Shokouhi,et al.  Advances in Information Retrieval Theory, Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Cambridge, UK, September 10-12, 2009, Proceedings , 2009, ICTIR.

[16]  Lourdes Araujo,et al.  Standard Deviation as a Query Hardness Estimator , 2010, SPIRE.

[17]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[18]  Stephen E. Robertson,et al.  Modeling score distributions in information retrieval , 2011, Information Retrieval.