Issues in incremental adaptation of statistical MT from human post-edits

This work investigates a crucial aspect for the integration of MT technology into a CAT environment, that is the ability of MT systems to adapt from the user feedback. In particular, we consider the scenario of an MT system tuned for a specific translation project that after each day of work adapts from the post-edited translations created by the user. We apply and compare different state-of-the-art adaptation methods on post-edited translations generated by two professionals during two days of work with a CAT tool embedding MT suggestions. Both translators worked at the same legal document from English into Italian and German, respectively. Although exactly the same amount of translations was available each day for each language , the application of the same adaptation methods resulted in quite different outcomes. This suggests that adaptation strategies should not be applied blindly, but rather taking into account language specific issues, such as data sparsity.

[1]  Arianna Bisazza,et al.  Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[2]  Mo Yu,et al.  Locally Training the Log-Linear Model for SMT , 2012, EMNLP.

[3]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Tanja Schultz,et al.  Incremental Adaptation of Speech-to-Speech Translation , 2009, NAACL.

[6]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[7]  Marcello Federico Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[8]  Philipp Koehn,et al.  Margin Infused Relaxed Algorithm for Moses , 2011, Prague Bull. Math. Linguistics.

[9]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[10]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[11]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[12]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[13]  Spyridon Matsoukas,et al.  Discriminative Corpus Weight Estimation for Machine Translation , 2009, EMNLP.

[14]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[15]  Min Zhang,et al.  Improving Language Model Size Reduction using Better Pruning Criteria , 2002, ACL.

[16]  Mauro Cettolo,et al.  Evaluating the Learning Curve of Domain Adaptive Statistical Machine Translation Systems , 2012, WMT@NAACL-HLT.

[17]  Eiichiro Sumita,et al.  Method of Selecting Training Data to Build a Compact and Efficient Translation Model , 2008, IJCNLP.

[18]  A. Waibel,et al.  Detailed Analysis of Different Strategies for Phrase Table Adaptation in SMT , 2012, AMTA.

[19]  Preslav Nakov,et al.  Improving English-Spanish Statistical Machine Translation: Experiments in Domain Adaptation, Sentence Paraphrasing, Tokenization, and Recasing , 2008, WMT@ACL.

[20]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[21]  Holger Schwenk,et al.  Machine Translation Systems for WMT 2012 , 2012 .

[22]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.