The Smoothed-Dirichlet distribution : Explaining KL-divergence based ranking in Information Retrieval

In this work, we analyze the popular KLdivergence ranking function in information retrieval. We uncover the generative distribution, namely the Smoothed Dirichlet distribution, underlying this ranking function and show that this distribution captures term occurrence distribution much better than the multinomial, thus offering, for the first time, a reason behind the success of the KLdivergence ranking function. We present theoretically motivated approximations to the distribution that lead to a closed form maximum likelihood solution, much like the multinomial, making it ideal for online IR tasks. We use the new distribution to construct a new, well-motivated ad-hoc retrieval algorithm. Our experiments show that this algorithm performs at least as well as similar algorithms that employ cross-entropy ranking. It also provides additional flexibility, e.g. in handling scenarios like a mixture of true and pseudo relevance feedback, due to a consistent generative framework.

[1]  Thorsten Gerber,et al.  Handbook Of Mathematical Functions , 2016 .

[2]  T. Minka Estimating a Dirichlet distribution , 2012 .

[3]  Victor Lavrenko,et al.  A Generative Theory of Relevance , 2008, The Information Retrieval Series.

[4]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[5]  David Kauchak,et al.  Modeling word burstiness using the Dirichlet distribution , 2005, ICML.

[6]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[7]  David R. Karger,et al.  Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model , 2003, SIGIR.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  J. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[10]  Jaime B. Teevan,et al.  Improving Information Retrieval with Textual Analysis: Bayesian Models and Beyond , 2001 .

[11]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[14]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[15]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..