Modeling score distributions in information retrieval

We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being ‘friendly’ to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe. Furthermore, we review recent non-binary mixture models, speculate on graded relevance, and consider methods such as logistic regression for score calibration.

[1]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[2]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[3]  Avi Arampatzis,et al.  Where to Stop Reading a Ranked List? , 2008, TREC.

[4]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[5]  Evangelos Kanoulas,et al.  Score distribution models: assumptions, intuition, and robustness to score manipulation , 2010, SIGIR.

[6]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[7]  Avi Arampatzis,et al.  A signal-to-noise approach to score normalization , 2009, CIKM.

[8]  R. Manmatha,et al.  Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[9]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[10]  Stephen E. Robertson,et al.  On Score Distributions and Relevance , 2007, ECIR.

[11]  Avi Arampatzis,et al.  Unbiased S-D Threshold Optimization, Initial Query Degradation, Decay, and Incrementality, for Adaptive Document Filtering , 2001, TREC.

[12]  Pablo Castells,et al.  Using historical data to enhance rank aggregation , 2006, SIGIR '06.

[13]  Avi Arampatzis,et al.  Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering , 2000, TREC.

[14]  C. J. van Rijsbergen,et al.  Probabilistic Retrieval Revisited , 1992, Comput. J..

[15]  Prasenjit Mitra,et al.  Query suggestions in the absence of query logs , 2011, SIGIR.

[16]  Abraham Bookstein,et al.  When the most "pertinent" document should not be retrieved - An analysis of the Swets model , 1977, Inf. Process. Manag..

[17]  Emine Yilmaz,et al.  Inferring document relevance from incomplete information , 2007, CIKM '07.

[18]  Norbert Fuhr,et al.  From Uncertain Inference to Probability of Relevance for Advanced IR Applications , 2003, ECIR.

[19]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[20]  Stephen E. Robertson,et al.  Where to stop reading a ranked list?: threshold optimization using truncated score distributions , 2009, SIGIR.

[21]  Ronan Cummins,et al.  Measuring the Ability of Score Distributions to Model Relevance , 2011, AIRS.

[22]  S. C. Choi,et al.  Maximum Likelihood Estimation of the Parameters of the Gamma Distribution and Their Bias , 1969 .

[23]  Fredric C. Gey,et al.  Probabilistic retrieval based on staged logistic regression , 1992, SIGIR '92.

[24]  Ellen M. Voorhees,et al.  Evaluation by highly relevant documents , 2001, SIGIR '01.

[25]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[26]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[27]  Yi Zhang,et al.  Maximum likelihood estimation for filtering thresholds , 2001, SIGIR '01.

[28]  Norbert Fuhr,et al.  Probalistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection , 1993, TREC.

[29]  Emine Yilmaz,et al.  The maximum entropy method for analyzing retrieval measures , 2005, SIGIR '05.

[30]  Avi Arampatzis,et al.  The score-distributional threshold optimization for adaptive binary classification tasks , 2001, SIGIR '01.

[31]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[32]  M. Neuts,et al.  On mixtures of χ2- andF-distributions which yield distributions of the same family , 1967 .

[33]  S. Robertson The probability ranking principle in IR , 1997 .

[34]  Mark Sanderson,et al.  Quantifying test collection quality based on the consistency of relevance judgements , 2011, SIGIR.

[35]  Stephen Robertson,et al.  Statistical problems in the application of probabilistic models to information retrieval , 1982 .

[36]  Falk Scholer,et al.  Modelling disagreement between judges for information retrieval system evaluation , 2009 .

[37]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[38]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[39]  Stephen E. Robertson The probabilistic character of relevance , 1977, Inf. Process. Manag..

[40]  M. de Rijke,et al.  Combination Methods for Crosslingual Web Retrieval , 2005, CLEF.

[41]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[42]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[43]  Christoph Baumgarten,et al.  A probabilistic solution to the selection and fusion problem in distributed information retrieval , 1999, SIGIR '99.

[44]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[45]  Mark Sanderson,et al.  Relevance judgments between TREC and Non-TREC assessors , 2008, SIGIR '08.

[46]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[47]  Stephen E. Robertson,et al.  Threshold setting in adaptive filtering , 2000, J. Documentation.

[48]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[49]  Christoph Baumgarten,et al.  Probabilistic information retrieval in a distributed heterogeneous environment , 1998 .

[50]  Donna K. Harman,et al.  Overview of the Ninth Text REtrieval Conference (TREC-9) , 2000, TREC.

[51]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[52]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[53]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[54]  Michael P. Wiper,et al.  Mixtures of Gamma Distributions With Applications , 2001 .

[55]  Gerald J. Kowalski,et al.  Information Retrieval Systems , 1997, The Information Retrieval Series.

[56]  Pablo Castells,et al.  Probabilistic Score Normalization for Rank Aggregation , 2006, ECIR.

[57]  Avi Arampatzis,et al.  Document Filtering as an Adaptive and Temporally-dependent Process , 2001 .

[58]  Morris Rubinoff,et al.  Statistical generation of a technical vocabulary , 1968 .

[59]  William S. Cooper,et al.  Some inconsistencies and misnomers in probabilistic information retrieval , 1991, SIGIR '91.

[60]  John A. Swets,et al.  Effectiveness of information retrieval methods , 1969 .

[61]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[62]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[63]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[64]  David R. Cox The analysis of binary data , 1970 .

[65]  Stephen E. Robertson,et al.  On Collection Size and Retrieval Effectiveness , 2004, Information Retrieval.

[66]  Fred J. Damerau,et al.  An experiment in automatic indexing , 1965 .

[67]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[68]  Ronan Cummins,et al.  Predicting Query Performance Directly from Score Distributions , 2011, AIRS.

[69]  Emine Yilmaz,et al.  A geometric interpretation of r-precision and its correlation with average precision , 2005, SIGIR '05.

[70]  Kevyn Collins-Thompson,et al.  Information Filtering, Novelty Detection, and Named-Page Finding , 2002, TREC.

[71]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[72]  A. T. Arampatzis,et al.  Adaptive and temporally-dependent document filtering , 2001 .

[73]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[74]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[75]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[76]  Douglas W. Oard,et al.  Overview of the TREC 2008 Legal Track , 2008, TREC.

[77]  Ondrej Lhoták,et al.  Estimating precision by random sampling (poster abstract) , 1999, SIGIR '99.

[78]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[79]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[80]  Stephen P. Harter,et al.  A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..

[81]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[82]  Jacques Savoy,et al.  Report on CLEF-2003 Multilingual Tracks , 2003, CLEF.

[83]  Javed A. Aslam,et al.  Modeling score distributions for information retrieval , 2012 .

[84]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[85]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[86]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[87]  Evangelos Kanoulas,et al.  Modeling the Score Distributions of Relevant and Non-relevant Documents , 2009, ICTIR.

[88]  Don R. Swanson,et al.  Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[89]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[90]  Stephen E. Robertson,et al.  THE PARAMETRIC DESCRIPTION OF RETRIEVAL TESTS: PART I: THE BASIC PARAMETERS , 1969 .

[91]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[92]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[93]  Paul N. Bennett Using asymmetric distributions to improve text classifier probability estimates , 2003, SIGIR.

[94]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[95]  Fredric C. Gey,et al.  Experiments in the Probabilistic Retrieval of Full Text Documents , 1994, TREC.

[96]  Evangelos Kanoulas,et al.  Extended Expectation Maximization for Inferring Score Distributions , 2012, ECIR.

[97]  Wessel Kraaij,et al.  A Language Modeling Approach to Tracking News Events , 2000 .

[98]  Emine Yilmaz,et al.  A geometric interpretation and analysis of R-precision , 2005, CIKM '05.

[99]  Evangelos Kanoulas,et al.  Variational bayes for modeling score distributions , 2011, Information Retrieval.