Can Automatic Post-Editing Improve NMT?

Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing effort. APE has had notable success when used with statistical machine translation (SMT) systems but has not been as successful over neural machine translation (NMT) systems. This has raised questions on the relevance of APE task in the current scenario. However, the training of APE models has been heavily reliant on large-scale artificial corpora combined with only limited human post-edited data. We hypothesize that APE models have been underperforming in improving NMT translations due to the lack of adequate supervision. To ascertain our hypothesis, we compile a larger corpus of human post-edits of English to German NMT. We empirically show that a state-of-art neural APE model trained on this corpus can significantly improve a strong in-domain NMT system, challenging the current understanding in the field. We further investigate the effects of varying training data sizes, using artificial training data, and domain specificity for the APE task. We release this new corpus under CC BY-NC-SA 4.0 license at this https URL.

[1]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[2]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[3]  Ventsislav Zhechev Machine Translation Infrastructure and Post-editing Performance at Autodesk , 2012, AMTA.

[4]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[5]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[8]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[9]  Marcello Federico,et al.  Online Neural Automatic Post-editing for Neural Machine Translation , 2018, CLiC-it.

[10]  Lucia Specia,et al.  Translation Quality and Productivity: A Study on Rich Morphology Languages , 2017, MTSUMMIT.

[11]  Marcin Junczys-Dowmunt,et al.  MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing , 2018, WMT.

[12]  Marcin Junczys-Dowmunt,et al.  Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing , 2016, WMT.

[13]  Marco Turchi,et al.  ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing , 2018, LREC.

[14]  Marco Turchi,et al.  Multi-source transformer with combined losses for automatic post editing , 2018, WMT.

[15]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[16]  Josef van Genabith,et al.  Statistical Post-Editing for a Statistical MT System , 2011, MTSUMMIT.

[17]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[18]  Gregory A. Sanders,et al.  The NIST 2008 Metrics for machine translation challenge—overview, methodology, metrics, and results , 2009, Machine Translation.

[19]  Santanu Pal,et al.  Manawi: Using Multi-Word Expressions and Named Entities to Improve Machine Translation , 2014, WMT@ACL.

[20]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[21]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[22]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[23]  Matteo Negri,et al.  Findings of the WMT 2019 Shared Task on Automatic Post-Editing , 2019, WMT.

[24]  Lucia Specia,et al.  Probing the Need for Visual Context in Multimodal Machine Translation , 2019, NAACL.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Lucia Specia,et al.  Exploring the use of acoustic embeddings in neural machine translation , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[27]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[28]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[29]  Markus Freitag,et al.  APE at Scale and Its Implications on MT Evaluation Biases , 2019, WMT.

[30]  Marcin Junczys-Dowmunt,et al.  The AMU-UEdin Submission to the WMT 2017 Shared Task on Automatic Post-Editing , 2017, WMT.

[31]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[32]  Lucia Specia,et al.  A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality? , 2020, LREC.

[33]  Timothy Baldwin,et al.  Continuous Measurement Scales in Human Evaluation of Machine Translation , 2013, LAW@ACL.

[34]  Preslav Nakov,et al.  Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation , 2015, DiscoMT@EMNLP.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[37]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[38]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[39]  Rico Sennrich,et al.  Context-Aware Monolingual Repair for Neural Machine Translation , 2019, EMNLP.

[40]  André F. T. Martins,et al.  Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-Based Encoder-Decoder for Automatic Post-Editing , 2019, WMT.

[41]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[42]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[43]  Matteo Negri,et al.  Findings of the WMT 2018 Shared Task on Automatic Post-Editing , 2018, WMT.