Kullback-Leibler Divergence Revisited

Thee KL divergence is the most commonly used measure for comparing query and document language models in the language modeling framework to ad hoc retrieval. Since KL is rank equivalent to a specific weighted geometric mean, we examine alternative weighted means for language-model comparison, as well as alternative divergence measures. The study includes analysis of the inverse document frequency (IDF) effect of the language-model comparison methods. Empirical evaluation, performed with different types of queries (short and verbose) and query-model induction approaches, shows that there are methods that often outperform the KL divergence in some settings.

[1]  Thorsten Brants,et al.  Multiple Similarity Measures and Source-Pair Information in Story Link Detection , 2004, HLT-NAACL.

[2]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[3]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[4]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[5]  ChengXiang Zhai,et al.  An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[6]  Éric Gaussier,et al.  A Theoretical Analysis of Pseudo-Relevance Feedback Models , 2013, ICTIR.

[7]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[8]  Tetsuya Sakai,et al.  Flexible pseudo-relevance feedback via selective sampling , 2005, TALIP.

[9]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[10]  H. Jeffreys,et al.  The Theory of Probability , 1896 .

[11]  Humberto Bustince,et al.  A Practical Guide to Averaging Functions , 2015, Studies in Fuzziness and Soft Computing.

[12]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[13]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[14]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[15]  W. Bruce Croft,et al.  On Divergence Measures and Static Index Pruning , 2015, ICTIR.

[16]  Fernando Diaz,et al.  Condensed List Relevance Models , 2015, ICTIR.

[17]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[18]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[19]  James Allan,et al.  The smoothed dirichlet distribution: understanding cross-entropy ranking in information retrieval , 2006 .

[20]  Javed A. Aslam,et al.  Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions , 2007, ECIR.

[21]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[22]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[23]  P. Bullen Handbook of means and their inequalities , 1987 .

[24]  W. Bruce Croft,et al.  Relevance Models in Information Retrieval , 2003 .

[25]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[26]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[27]  W. Bruce Croft,et al.  A general language model for information retrieval (poster abstract) , 1999, SIGIR '99.

[28]  Lillian Lee,et al.  On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.

[29]  Don H. Johnson,et al.  Symmetrizing the Kullback-Leibler Distance , 2001 .

[30]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[31]  ChengXiang Zhai,et al.  Axiomatic Analysis of Smoothing Methods in Language Models for Pseudo-Relevance Feedback , 2015, ICTIR.

[32]  ChengXiang Zhai,et al.  Axiomatic Analysis of Translation Language Model for Information Retrieval , 2012, ECIR.