A Statistical View of Binned Retrieval Models

Many traditional information retrieval models, such as BM25 and language modeling, give good retrieval effectiveness, but can be difficult to implement efficiently. Recently, document-centric impact models were developed in order to overcome some of these efficiency issues. However, such models have a number of problems, including poor effectiveness, and heuristic term weighting schemes. In this work, we present a statistical view of document-centric impact models. We describe how such models can be treated statistically and propose a supervised parameter estimation technique. We analyze various theoretical and practical aspects of the model and show that weights estimated using our new estimation technique are significantly better than the integer-based weights used in previous studies.

[1]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[2]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[3]  Karen Spärck,et al.  Language modelling ’ s generative model : is it rational ? , 2004 .

[4]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[5]  Alistair Moffat,et al.  Collection-Independent Document-Centric Impacts , 2004, ADCS.

[6]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[7]  Norbert Fuhr,et al.  Two models of retrieval with probabilistic indexing , 1986, SIGIR '86.

[8]  ChengXiang Zhai,et al.  An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[9]  Ronald Fagin,et al.  Static index pruning for information retrieval systems , 2001, SIGIR '01.

[10]  Ellen M. Voorhees,et al.  Bias and the limits of pooling , 2006, SIGIR '06.

[11]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[12]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[13]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[14]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[15]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[16]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[17]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[18]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[19]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[20]  Alistair Moffat,et al.  Pruned query evaluation using pre-computed impacts , 2006, SIGIR.

[21]  Jianfeng Gao,et al.  Linear discriminant model for information retrieval , 2005, SIGIR '05.

[22]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .

[23]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[24]  Alistair Moffat,et al.  Melbourne University 2004: Terabyte and Web Tracks , 2004, TREC.

[25]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[26]  Alistair Moffat,et al.  Simplified similarity scoring using term ranks , 2005, SIGIR '05.

[27]  Charles L. A. Clarke,et al.  A document-centric approach to static index pruning in text retrieval systems , 2006, CIKM '06.

[28]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.