Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings

We propose the use of pre-trained embeddings as features of a regression model for sentence-level quality estimation of machine translation. In our work we combine freely available BERT and LASER multilingual embeddings to train a neural-based regression model. In the second proposed method we use as an input features not only pre-trained embeddings, but also log probability of any machine translation (MT) system. Both methods are applied to several language pairs and are evaluated both as a classical quality estimation system (predicting the HTER score) as well as an MT metric (predicting human judgements of translation quality).

[1]  Timothy Baldwin,et al.  Can machine translation systems be evaluated by the crowd alone , 2015, Natural Language Engineering.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Maja Popovic,et al.  chrF++: words helping character n-grams , 2017, WMT.

[4]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[5]  Ondrej Bojar,et al.  Results of the WMT18 Metrics Shared Task: Both characters and embeddings achieve good performance , 2018, WMT.

[6]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[7]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[8]  Kai Fan,et al.  "Bilingual Expert" Can Find Translation Errors , 2018, AAAI.

[9]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[10]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  André F. T. Martins,et al.  Findings of the WMT 2019 Shared Tasks on Quality Estimation , 2019, WMT.

[13]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[14]  Lucia Specia,et al.  deepQuest: A Framework for Neural-based Quality Estimation , 2018, COLING.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Jong-Hyeok Lee,et al.  Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[18]  Ondrej Bojar,et al.  Results of the WMT17 Metrics Shared Task , 2017, WMT.

[19]  Lucia Specia,et al.  Findings of the WMT 2018 Shared Task on Quality Estimation , 2018, WMT.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[22]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[23]  Philipp Koehn,et al.  Results of the WMT15 Metrics Shared Task , 2015, WMT@EMNLP.

[24]  Ondrej Bojar,et al.  Results of the WMT13 Metrics Shared Task , 2015, WMT@EMNLP.