论文信息 - Ad hoc and Multilingual Information Retrieval at IBM

Ad hoc and Multilingual Information Retrieval at IBM

IBM participated in two tracks at TREC-7: ad hoc and cross-language. In the adhoc task we contrasted the performance of two di erent query expansion techniques: local context analysis and probabilistic model. Two themes characterize IBM's participation in the CLIR track at TREC-7. The rst is the use of statistical methods. In order to use the document translation approach, we built a fast (translation time within an order of magnitude of the indexing time) French)English translation model trained from parallel corpora. We also trained German)French and Italian)French translation models entirely from comparable corpora. The unique characteristic of the work described here is that all bilingual resources and translation models were learned automatically from corpora (parallel and comparable.) The other theme is that the widely varying quality and availability of bilingual resources means that language pairs must be treated separately. We will describe methods for using one language as a pivot language in order to decrease the number pairs, as well as methods for merging the results from several retrievals.

Salim Roukos | Martin Franz | J. Scott McCarley | S. Roukos | M. Franz

[1] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2] W. Bruce Croft,et al. Query expansion using local and global document analysis , 1996, SIGIR '96.

[3] Salim Roukos,et al. A method for scoring correlated features in query expansion , 1998, SIGIR '98.

[4] Salim Roukos,et al. Probabilistic Modeling for Information Retrieval with Unsupervised Training Data , 1998, KDD.

[5] B. Merialdo,et al. Tagging text with a probabilistic model , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.