Quality In, Quality Out: Learning from Actual Mistakes

Approaches to Quality Estimation (QE) of machine translation have shown promising results at predicting quality scores for translated sentences. However, QE models are often trained on noisy approximations of quality annotations derived from the proportion of post-edited words in translated sentences instead of direct human annotations of translation errors. The latter is a more reliable ground-truth but more expensive to obtain. In this paper, we present the first attempt to model the task of predicting the proportion of actual translation errors in a sentence while minimising the need for direct human annotation. For that purpose, we use transfer-learning to leverage large scale noisy annotations and small sets of high-fidelity human annotated translation errors to train QE models. Experiments on four language pairs and translations obtained by statistical and neural models show consistent gains over strong baselines.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[3]  Yvette Graham,et al.  Improving Evaluation of Machine Translation Quality Estimation , 2015, ACL.

[4]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[5]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[8]  Lucia Specia,et al.  Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation , 2013, ACL.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[13]  Lucia Specia,et al.  Translation Quality and Productivity: A Study on Rich Morphology Languages , 2017, MTSUMMIT.

[14]  André F. T. Martins,et al.  Unbabel's Participation in the WMT19 Translation Quality Estimation Shared Task , 2019, WMT.

[15]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[16]  Jong-Hyeok Lee,et al.  Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[17]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[18]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[19]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[20]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[22]  A. Burchardt,et al.  Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics , 2014 .

[23]  Lucia Specia,et al.  deepQuest: A Framework for Neural-based Quality Estimation , 2018, COLING.

[24]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[25]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[26]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[27]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[28]  Lucia Specia,et al.  Large-scale Multitask Learning for Machine Translation Quality Estimation , 2016, NAACL.

[29]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[30]  Chris Quirk,et al.  Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[31]  Lucia Specia,et al.  Findings of the WMT 2018 Shared Task on Quality Estimation , 2018, WMT.

[32]  André F. T. Martins,et al.  Findings of the WMT 2019 Shared Tasks on Quality Estimation , 2019, WMT.