Axiomatic Analysis of Smoothing Methods in Language Models for Pseudo-Relevance Feedback

Pseudo-Relevance Feedback (PRF) is an important general technique for improving retrieval effectiveness without requiring any user effort. Several state-of-the-art PRF models are based on the language modeling approach where a query language model is learned based on feedback documents. In all these models, feedback documents are represented with unigram language models smoothed with a collection language model. While collection language model-based smoothing has proven both effective and necessary in using language models for retrieval, we use axiomatic analysis to show that this smoothing scheme inherently causes the feedback model to favor frequent terms and thus violates the IDF constraint needed to ensure selection of discriminative feedback terms. To address this problem, we propose replacing collection language model-based smoothing in the feedback stage with additive smoothing, which is analytically shown to select more discriminative terms. Empirical evaluation further confirms that additive smoothing indeed significantly outperforms collection-based smoothing methods in multiple language model-based PRF models.

[1]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[2]  Éric Gaussier,et al.  A Theoretical Analysis of Pseudo-Relevance Feedback Models , 2013, ICTIR.

[3]  James Allan,et al.  Relevance feedback with too much data , 1995, SIGIR '95.

[4]  ChengXiang Zhai,et al.  Web Search Relevance Feedback , 2009, Encyclopedia of Database Systems.

[5]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[6]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[7]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[8]  ChengXiang Zhai,et al.  Revisiting the Divergence Minimization Feedback Model , 2014, CIKM.

[9]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[10]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[11]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[12]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[13]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[14]  ChengXiang Zhai,et al.  A comparative study of methods for estimating query language models with pseudo feedback , 2009, CIKM.

[15]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[16]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[17]  W. Bruce Croft,et al.  Geometric representations for multiple documents , 2010, SIGIR.

[18]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..