论文信息 - Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality. We then use state-of-the-art techniques to build a specialized bilingual lexicon from these sentences and evaluate the contribution of this lexicon when added to the comparable corpus-based alignment technique. Finally, the value of this approach is demonstrated by the improvement of translation accuracy for medical words.

Emmanuel Morin | Emmanuel Prochasson

[1] Holger Schwenk,et al. On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[2] Jian-Yun Nie,et al. Parallel Web text mining for cross-language IR , 2000, RIAO.

[3] Dragos Stefan Munteanu,et al. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[4] Pascale Fung,et al. A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[5] Noah A. Smith,et al. The Web as a Parallel Corpus , 2003, CL.

[6] Emmanuel Morin,et al. French-English Terminology Extraction from Comparable Corpora , 2005, IJCNLP.

[7] J.-M. Lange,et al. Modèles statistiques pour l'extraction de lexiques bilingues , 1995 .

[8] Pierre Zweigenbaum,et al. The Effect of a General Lexicon in Corpus-Based Identification of French-English Medical Word Translations , 2003, MIE.

[9] Pascale Fung,et al. Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E , 2004, EMNLP.

[10] Hang Li,et al. Base Noun Phrase Translation Using Web Data and the EM Algorithm , 2002, COLING.

[11] Philippe Langlais,et al. Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora , 2010, COLING.