论文信息 - Automatic construction of parallel English-Chinese corpus for cross-language information retrieval

Automatic construction of parallel English-Chinese corpus for cross-language information retrieval

A major obstacle to the construction of a probabilistic translation model is the lack of large parallel corpora. In this paper we first describe a parallel text mining system that finds parallel texts automatically on the Web. The generated Chinese-English parallel corpus is used to train a probabilistic translation model which translates queries for Chinese-English cross-language information retrieval (CLIR). We will discuss some problems in translation model training and show the preliminary CLIR results.

Jian-Yun Nie | Jiang Chen | Jian-Yun Nie | Jiang Chen

[1] Kui-Lam Kwok,et al. English-Chinese Cross-Language Retrieval based on a Translation Package , 1999 .

[2] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3] Dekai Wu,et al. Aligning a Parallel English-Chinese Corpus Statistically With Lexical Criteria , 1994, ACL.

[4] Robert L. Mercer,et al. Aligning Sentences in Parallel Corpora , 1991, ACL.

[5] Jian-Yun Nie,et al. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[6] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[7] Stanley F. Chen,et al. Aligning Sentences in Bilingual Corpora Using Lexical Information , 1993, ACL.

[8] Philip Resnik,et al. Parallel strands: a preliminary investigation into mining the Web for bilingual text , 1998, AMTA.

[9] Michel Simard,et al. Using cognates to align sentences in bilingual corpora , 1993, TMI.

[10] Martin Kay,et al. Text-Translation Alignment , 1993, Comput. Linguistics.

[11] Philippe Langlais,et al. Unit Completion for a Computer-aided Translation Typing System , 2000, ANLP.