Model-based feedback in the language modeling approach to information retrieval

The language modeling approach to retrieval has been shown to perform well empirically. One advantage of this new approach is its statistical foundations. However, feedback, as one important component in a retrieval system, has only been dealt with heuristically in this new retrieval approach: the original query is usually literally expanded by adding additional terms to it. Such expansion-based feedback creates an inconsistent interpretation of the original and the expanded query. In this paper, we present a more principled approach to feedback in the language modeling approach. Specifically, we treat feedback as updating the query language model based on the extra evidence carried by the feedback documents. Such a model-based feedback strategy easily fits into an extension of the language modeling approach. We propose and evaluate two different approaches to updating a query language model based on feedback documents, one based on a generative probabilistic model of feedback documents and one based on minimization of the KL-divergence over feedback documents. Experiment results show that both approaches are effective and outperform the Rocchio feedback approach.

[1]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[2]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[3]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[4]  W. Bruce Croft,et al.  A general language model for information retrieval (poster abstract) , 1999, SIGIR '99.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[7]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[8]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[9]  Yiyu Yao,et al.  A probability distribution model for information retrieval , 1989, Inf. Process. Manag..

[10]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[11]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[12]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[13]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[14]  Kenney Ng A Maximum Likelihood Ratio Information Retrieval Model , 1999, TREC.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .