The Weaver System for Document Retrieval

This paper introduces Weaver, a probabilistic document retrieval system under development at Carnegie Mellon University, and discusses its performance in the TREC-8 ad hoc evaluation. We begin by describing the architecture and philosophy of the Weaver system, which represents a departure from traditional approaches to retrieval. The central ingredient is a statistical model of how a user might distill or \translate" a given document into a query. The retrieval-as-translation approach is based on the noisy channel paradigm and statistical language modeling, and has much in common with other recently proposed models [12, 10]. After the initial high-level overview, the bulk of the paper contains a discussion of implementation details and the empirical performance of the Weaver retrieval system.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[5]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[6]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[7]  John D. Lafferty,et al.  The Candide System for Machine Translation , 1994, HLT.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Andrei BroderMonika Henzinger Information retrieval on the Web Tools & algorithmic issues , 1998 .

[10]  Richard M. Schwartz,et al.  BBN at TREC7: Using Hidden Markov Models for Information Retrieval , 1998, TREC.

[11]  Don R. Swanson,et al.  Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[12]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[13]  W. Bruce Croft,et al.  Efficient probabilistic Inference for text retrieval , 1991, RIAO.

[14]  W. N. Locke,et al.  Machine Translation of Languages , 1956 .

[15]  Kenneth Ward Church,et al.  Identifying Word Correspondences in Parallel Texts , 1991, HLT.