论文信息 - Using Parallel Web Pages for Multi-lingual IR

Using Parallel Web Pages for Multi-lingual IR

In this report, we describe the approach we used in CLEF Cross-Language IR (CLIR) tasks. In our experiments, we used statistical models estimated from parallel texts automatically mined from the Web. In our previous experiments, we tested CLIR for English-French and English-Chinese. Our goal of this series of experiments is to see if the approach may be extended to multi-lingual IR (with other languages). In particular, we compare models trained from the Web documents with models that also combine other resources such as dictionaries.

Jian-Yun Nie | Michel Simard

[1] Jian-Yun Nie,et al. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[2] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[3] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4] Jian-Yun Nie. TREC-7 CLIR using a Probabilistic Translation Model , 1998, TREC.

[5] W ChurchKenneth,et al. A program for aligning sentences in bilingual corpora , 1993 .

[6] Salim Roukos,et al. Ad hoc and Multilingual Information Retrieval at IBM , 1998, TREC.