论文信息 - Measuring Machine Translation Errors in New Domains - 字舞流文

Measuring Machine Translation Errors in New Domains

We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a micro-level analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.

Dragos Stefan Munteanu | John Morgan | Marine Carpuat | Hal Daumé | Ann Irvine | Hal Daumé | A. Irvine | Marine Carpuat | D. Munteanu | John Morgan | Ann Irvine

[1] Roland Kuhn,et al. Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[2] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3] Andy Way,et al. Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models , 2012, COLING.

[4] Qun Liu,et al. Improving Statistical Machine Translation Performance by Training Data Selection and Optimization , 2007, EMNLP-CoNLL.

[5] Hal Daumé,et al. Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[6] Alexandre Allauzen,et al. Assessing Phrase-Based Translation Models with Oracle Decoding , 2010, EMNLP.

[7] George F. Foster,et al. Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[8] Jianfeng Gao,et al. Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[9] Guodong Zhou,et al. Cache-based Document-level Statistical Machine Translation , 2011, EMNLP.

[10] Hermann Ney,et al. HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[11] Hermann Ney,et al. Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[12] Marine Carpuat,et al. Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[13] Alex Waibel,et al. Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[14] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[15] Hermann Ney,et al. Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[16] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[17] Philipp Koehn,et al. Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[18] Stephan Vogel,et al. Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[19] Philipp Koehn,et al. Analysing the Effect of Out-of-Domain Data on SMT Systems , 2012, WMT@NAACL-HLT.

[20] Jörg Tiedemann. To Cache or Not To Cache? Experiments with Adaptive Models in Statistical Machine Translation , 2010, WMT@ACL.

[21] Rachel Rudinger,et al. SenseSpotting: Never let your parallel data tie you to an old domain , 2013, ACL.

[22] Hermann Ney,et al. Combining translation and language model scoring for domain-specific data filtering , 2011, IWSLT.

[23] Roland Kuhn,et al. Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[24] Kevin Duh,et al. Analysis of translation model adaptation in statistical machine translation , 2010, IWSLT.

[25] Michael Collins,et al. Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation , 2011, EMNLP.

[26] Philipp Koehn,et al. Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[27] Alex Waibel,et al. Domain Adaptation in Statistical Machine Translation using Factored Translation Models , 2010, EAMT.

[28] Alexandre Allauzen,et al. LIMSI’s experiments in domain adaptation for IWSLT11 , 2011, IWSLT.

[29] Chris Quirk,et al. Monolingual Marginal Matching for Translation Model Adaptation , 2013, EMNLP.

[30] Arianna Bisazza,et al. Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[31] Yaser Al-Onaizan,et al. Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[32] Joe Stringham,et al. Adaptation in Translation , 1976 .

[33] Rada Mihalcea,et al. SemEval-2010 Task 2: Cross-Lingual Lexical Substitution , 2009, SemEval@ACL.

[34] Alex Kulesza,et al. Confidence Estimation for Machine Translation , 2004, COLING.

[35] Jörg Tiedemann,et al. News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[36] Anoop Sarkar,et al. Mixing Multiple Translation Models in Statistical Machine Translation , 2012, ACL.

[37] Spyridon Matsoukas,et al. Discriminative Corpus Weight Estimation for Machine Translation , 2009, EMNLP.

[38] Holger Schwenk,et al. Automatic Translation of Scientific Documents in the HAL Archive , 2012, LREC.

[39] Rohit Prasad,et al. On-line Language Model Biasing for Statistical Machine Translation , 2011, ACL.

[40] Rico Sennrich,et al. Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation , 2012, EACL.