Regularized estimation of mixture models for robust pseudo-relevance feedback

Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.

[1]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[2]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[3]  Donna K. Harman,et al.  The NRRC reliable information access (RIA) workshop , 2004, SIGIR '04.

[4]  Luo Si,et al.  Effect of varying number of documents in blind feedback: analysis of the 2003 NRRC RIA workshop "bf_numdocs" experiment suite , 2004, SIGIR '04.

[5]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[6]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[7]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[8]  Djoerd Hiemstra,et al.  Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[9]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[10]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[11]  Tetsuya Sakai,et al.  Flexible pseudo-relevance feedback via selective sampling , 2005, TALIP.

[12]  David A. Evans,et al.  Design and Evaluation of the CLARIT-TREC-2 System , 1993, TREC.

[13]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[14]  Tao Tao,et al.  A Mixture Clustering Model for Pseudo Feedback in Information Retrieval , 2004 .

[15]  Carmel Domshlak,et al.  Better than the real thing?: iterative pseudo-query processing using cluster-based language models , 2005, SIGIR '05.