论文信息 - Online adaptation to post-edits for phrase-based statistical machine translation - 字舞流文

Online adaptation to post-edits for phrase-based statistical machine translation

Recent research has shown that accuracy and speed of human translators can benefit from post-editing output of machine translation systems, with larger benefits for higher quality output. We present an efficient online learning framework for adapting all modules of a phrase-based statistical machine translation system to post-edited translations. We use a constrained search technique to extract new phrase-translations from post-edits without the need of re-alignments, and to extract phrase pair features for discriminative training without the need for surrogate references. In addition, a cache-based language model is built on $$n$$n-grams extracted from post-edits. We present experimental results in a simulated post-editing scenario and on field-test data. Each individual module substantially improves translation quality. The modules can be implemented efficiently and allow for a straightforward stacking, yielding significant additive improvements on several translation directions and domains.

Mauro Cettolo | Marcello Federico | Stefan Riezler | Katharina Wäschle | Nicola Bertoldi | Patrick Simianer | Marcello Federico | N. Bertoldi | M. Cettolo | S. Riezler | K. Wäschle | P. Simianer

[1] 2010 International Workshop on Spoken Language Translation, IWSLT 2010, Paris, France, December 2-3, 2010 , 2010, IWSLT.

[2] Mauro Cettolo,et al. Evaluating the Learning Curve of Domain Adaptive Statistical Machine Translation Systems , 2012, WMT@NAACL-HLT.

[3] Chris Dyer,et al. A Bayesian Model for Learning SCFGs with Discontiguous Rules , 2012, EMNLP.

[4] Alon Lavie,et al. Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation , 2014, EACL.

[5] Mo Yu,et al. Locally Training the Log-Linear Model for SMT , 2012, EMNLP.

[6] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Martin Volk,et al. Assessing post-editing efficiency in a realistic translation environment , 2013, MTSUMMIT.

[8] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9] Germán Sanchis-Trilles,et al. Online Learning of Log-Linear Weights in Interactive Machine Translation , 2012, IberSPEECH.

[10] Stefan Riezler,et al. Analyzing Parallelism and Domain Similarities in the MAREC Patent Corpus , 2012, IRFC.

[11] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[12] Marcello Federico,et al. Generative and Discriminative Methods for Online Adaptation in SMT , 2013, MTSUMMIT.

[13] Chris Callison-Burch,et al. Stream-based Translation Models for Statistical Machine Translation , 2010, NAACL.

[14] Marcello Federico. Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[15] Mauro Cettolo,et al. Mining parallel fragments from comparable texts , 2010, IWSLT.

[16] Tomaz Erjavec,et al. The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[17] George F. Foster,et al. Adaptive Language and Translation Models for Interactive Machine Translation , 2004, EMNLP.

[18] Jeffrey Heer,et al. The efficacy of human post-editing for language translation , 2013, CHI.

[19] Jörg Tiedemann,et al. Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache , 2010, ACL 2010.

[20] Roland Kuhn,et al. Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[21] Philipp Koehn,et al. Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[22] Sriram Subramanian,et al. Talking about tactile experiences , 2013, CHI.

[23] D. Hardt,et al. Incremental Re-training for Post-editing SMT , 2010, AMTA.

[24] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[25] Russell V. Lenth,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[28] Francisco Casacuberta,et al. Online Learning for Interactive Statistical Machine Translation , 2010, NAACL.

[29] Arianna Bisazza,et al. Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[30] Marcello Federico. Methods for Smoothing the Optimizer Instability in SMT , 2011, MTSUMMIT.

[31] Scott M. Smith,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1989 .

[32] Mauro Cettolo,et al. Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation , 2013, MTSUMMIT.

[33] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[34] Pascual Martínez-Gómez,et al. Online adaptation strategies for statistical machine translation in post-editing scenarios , 2012, Pattern Recognit..

[35] Mauro Cettolo,et al. IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[36] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[37] Ben Taskar,et al. An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[38] Alon Lavie,et al. Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[39] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[40] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[41] Nicola Bertoldi,et al. Dynamic Models in Moses for Online Adaptation , 2014, Prague Bull. Math. Linguistics.

[42] Philip Resnik,et al. Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.