On Score Distributions and Relevance

We discuss the idea of modelling the statistical distributions of scores of documents, classified as relevant or non-relevant. Various specific combinations of standard statistical distributions have been used for this purpose. Some theoretical considerations indicate problems with some of the choices of pairs of distributions. Specifically, we revisit a generalisation of the well-known inverse relationship between recall and precision: some choices of pairs of distributions violate this generalised relationship. We identify the choices and the violations, and explore some of the consequences of this theoretical view.

[1]  R. Manmatha,et al.  Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[2]  John A. Swets,et al.  Effectiveness of information retrieval methods , 1969 .

[3]  Stephen E. Robertson,et al.  Explicit and implicit variables in information retrieval (IR) systems , 1975, J. Am. Soc. Inf. Sci..

[4]  Abraham Bookstein,et al.  When the most "pertinent" document should not be retrieved - An analysis of the Swets model , 1977, Inf. Process. Manag..

[5]  Avi Arampatzis,et al.  The score-distributional threshold optimization for adaptive binary classification tasks , 2001, SIGIR '01.

[6]  Christoph Baumgarten,et al.  A probabilistic solution to the selection and fusion problem in distributed information retrieval , 1999, SIGIR '99.

[7]  Stephen E. Robertson,et al.  On Collection Size and Retrieval Effectiveness , 2004, Information Retrieval.

[8]  Stephen E. Robertson,et al.  THE PARAMETRIC DESCRIPTION OF RETRIEVAL TESTS: PART I: THE BASIC PARAMETERS , 1969 .

[9]  Kevyn Collins-Thompson,et al.  Information Filtering, Novelty Detection, and Named-Page Finding , 2002, TREC.

[10]  Ellen M. Voorhees,et al.  The eleventh text REtrieval conference, TREC 2002 , 2003 .

[11]  Michael D. Gordon,et al.  A utility theoretic examination of the probability ranking principle in information retrieval , 1991, J. Am. Soc. Inf. Sci..

[12]  C. J. V. Rijsbergen,et al.  3 Retrieval effectiveness , 2008 .

[13]  S. E. Robertson,et al.  THE PARAMETRIC DESCRIPTION OF RETRIEVAL TESTS , 1969 .

[14]  J A Swets,et al.  Information Retrieval Systems. , 1963, Science.

[15]  S. Robertson The probability ranking principle in IR , 1997 .