论文信息 - Bayesian extension to the language model for ad hoc information retrieval

Bayesian extension to the language model for ad hoc information retrieval

We propose a Bayesian extension to the ad-hoc Language Model. Many smoothed estimators used for the multinomial query model in ad-hoc Language Models (including Laplace and Bayes-smoothing) are approximations to the Bayesian predictive distribution. In this paper we derive the full predictive distribution in a form amenable to implementation by classical IR models, and then compare it to other currently used estimators. In our experiments the proposed model outperforms Bayes-smoothing, and its combination with linear interpolation smoothing outperforms all other estimators.

Djoerd Hiemstra | Hugo Zaragoza | Michael E. Tipping | H. Zaragoza | D. Hiemstra

[1] Djoerd Hiemstra,et al. Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[2] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[3] John D. Lafferty,et al. Information retrieval as statistical translation , 1999, SIGIR '99.

[4] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[5] G. Wahba. Spline models for observational data , 1990 .

[6] David J. C. MacKay,et al. A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[7] Donald H. Kraft,et al. SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 9-13, 2001, New Orleans, Louisiana, USA , 2001, SIGIR.

[8] Djoerd Hiemstra,et al. Language models and probability of relevance , 2001 .

[9] John D. Lafferty,et al. Two-stage language models for information retrieval , 2002, SIGIR '02.

[10] Richard M. Schwartz,et al. BBN at TREC7: Using Hidden Markov Models for Information Retrieval , 1998, TREC.

[11] Ellen M. Voorhees,et al. The seventh text REtrieval conference (TREC-7) , 1999 .