论文信息 - Information retrieval as statistical translation

Information retrieval as statistical translation

We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this approach is a statistical model of how a user might distill or "translate" a given document into a query. To assess the relevance of a document to a user's query, we estimate the probability that the query would have been generated as a translation of the document, and factor in the user's general preferences in the form of a prior distribution over documents. We propose a simple, well motivated model of the document-to-query translation process, and describe an algorithm for learning the parameters of this model in an unsupervised manner from a collection of documents. As we show, one can view this approach as a generalization and justification of the "language modeling" strategy recently proposed by Ponte and Croft. In a series of experiments on TREC data, a simple translation-based retrieval system performs well in comparison to conventional retrieval techniques. This prototype system only begins to tap the full potential of translation-based retrieval.

John Lafferty | Adam L. Berger | J. Lafferty | A. Berger

[1] Don R. Swanson,et al. Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[2] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[3] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4] W. Bruce Croft,et al. Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[5] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[7] Kenneth Ward Church,et al. Identifying word correspondence in parallel texts , 1991 .

[8] W. Bruce Croft,et al. Efficient probabilistic Inference for text retrieval , 1991, RIAO.

[9] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[10] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[11] Robert L. Mercer,et al. But Dictionaries Are Data Too , 1993, HLT.

[12] Koichi Takeda,et al. Information retrieval on the web , 2000, CSUR.