论文信息 - Relevance-Based Language Models

Relevance-Based Language Models

We explore the relation between classical probabilistic models of information retrieval and the emerging language modeling approaches. It has long been recognized that the primary obstacle to effective performance of classical models is the need to estimate a relevance model: probabilities of words in the relevant class. We propose a novel technique for estimating these probabilities using the query alone. We demonstrate that our technique can produce highly accurate relevance models, addressing important notions of synonymy and polysemy. Our experiments show relevance models outperforming baseline language modeling systems on TREC retrieval and TDT tracking tasks. The main contribution of this work is an effective formal method for estimating a relevance model with no training data.

W. Bruce Croft | Victor Lavrenko | V. Lavrenko

[1] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2] Van Rijsbergen,et al. A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[3] W. Bruce Croft,et al. Efficient probabilistic Inference for text retrieval , 1991, RIAO.

[4] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[5] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[6] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[7] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[8] S. Robertson. The probability ranking principle in IR , 1997 .

[9] Alvin F. Martin,et al. The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[10] R. Papka,et al. On-line new event detection and tracking , 1998, SIGIR '98.

[11] Richard M. Schwartz,et al. A hidden Markov model information retrieval system , 1999, SIGIR '99.