Neural Post-Editing Based on Quality Estimation

Automatic post-editing (APE) is a challenging task on WMT evaluation campaign. We find that only a small number of edit operations are required for most machine translation outputs, through analysis of the training set of WMT17 APE en-de task. Based on this statistics analysis, two neural postediting (NPE) models are trained depended on the edit numbers: single edit and minor edits. The improved quality estimation (QE) approach is exploited to rank models, and select the best translation as the post-edited output from the n-best list translation hypotheses generated by the best APE model and the raw translation system. Experimental results on the datasets of WMT16 APE test set show that the proposed approach significantly outperformed the baseline. Our approach can bring considerable relief from the overcorrection problem in APE.

[1]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[2]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[3]  Marco Turchi,et al.  The FBK Participation in the WMT15 Automatic Post-editing Shared Task , 2015 .

[4]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[5]  Marion Weller,et al.  Exploring the Planet of the APEs: a Comparative Study of State-of-the-art Methods for MT Automatic Post-Editing , 2015, ACL.

[6]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Josef van Genabith,et al.  A Neural Network based Approach to Automatic Post-Editing , 2016, ACL.

[9]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[10]  Josef van Genabith,et al.  Statistical Post-Editing for a Statistical MT System , 2011, MTSUMMIT.

[11]  Marcin Junczys-Dowmunt,et al.  Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing , 2016, WMT.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Lucia Specia,et al.  Investigating Continuous Space Language Models for Machine Translation Quality Estimation , 2015, EMNLP.