论文信息 - Improving the precision of automatically constructed human-oriented translation dictionaries

Improving the precision of automatically constructed human-oriented translation dictionaries

In this paper we address the problem of automatic acquisition of a human-oriented translation dictionary from a large-scale parallel corpus. The initial translation equivalents can be extracted with the help of the techniques and tools developed for the phrase-table construction in statistical machine translation. The acquired translation equivalents usually provide good lexicon coverage, but they also contain a large amount of noise. We propose a supervised learning algorithm for the detection of noisy translations, which takes into account the context and syntax features, averaged over the sentences in which a given phrase pair occurred. Across nine European language pairs the number of serious translation errors is reduced by 43.2%, compared to a baseline which uses only phrase-level statistics.

Alexandra Antonova | Alexey Misyurev

[1] I. Dan Melamed. Automatic Construction of Clean Broad-Coverage Translation Lexicons , 1996, AMTA.

[2] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3] Sumithra Velupillai,et al. Finding the Parallel : Automatic Dictionary Construction and Identification of Parallel Text Pairs , 2008 .

[4] Iadh Ounis,et al. Building Bilingual Dictionaries from Parallel Web Documents , 2002, ECIR.

[5] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[6] Wang Ling,et al. Entropy-based Pruning for Phrase-based Machine Translation , 2012, EMNLP.

[7] Dan Tufis,et al. Computational bilingual lexicography: automatic extraction of translation dictionaries , 2001 .

[8] Carita Paradis,et al. What a corpus-based dictionary , 2006 .

[9] Alexander H. Waibel,et al. Translation Model Pruning via Usage Statistics for Statistical Machine Translation , 2007, HLT-NAACL.

[10] Hideki Hirakawa,et al. Building An MT Dictionary From Parallel Texts Based On Linguistic And Statistical Information , 1994, COLING.

[11] I. Dan Melamed,et al. Models of translation equivalence among words , 2000, CL.