An Oblivious Approach to Machine Translation Quality Estimation

Machine translation (MT) is being used by millions of people daily, and therefore evaluating the quality of such systems is an important task. While human expert evaluation of MT output remains the most accurate method, it is not scalable by any means. Automatic procedures that perform the task of Machine Translation Quality Estimation (MT-QE) are typically trained on a large corpus of source–target sentence pairs, which are labeled with human judgment scores. Furthermore, the test set is typically drawn from the same distribution as the train. However, recently, interest in low-resource and unsupervised MT-QE has gained momentum. In this paper, we define and study a further restriction of the unsupervised MT-QE setting that we call oblivious MT-QE. Besides having no access no human judgment scores, the algorithm has no access to the test text’s distribution. We propose an oblivious MT-QE system based on a new notion of sentence cohesiveness that we introduce. We tested our system on standard competition datasets for various language pairs. In all cases, the performance of our system was comparable to the performance of the non-oblivious baseline system provided by the competition organizers. Our results suggest that reasonable MT-QE can be carried out even in the restrictive oblivious setting.

[1]  Haizhou Li,et al.  Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics , 2019, Comput. Speech Lang..

[2]  Jong-Hyeok Lee,et al.  Predictor-Estimator , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[3]  André F. T. Martins,et al.  Findings of the WMT 2019 Shared Tasks on Quality Estimation , 2019, WMT.

[4]  Mark Fishel,et al.  Quality Estimation with Force-Decoded Attention and Cross-lingual Embeddings , 2018, WMT.

[5]  Jong-Hyeok Lee,et al.  Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[6]  Ramón Fernández Astudillo,et al.  Pushing the Limits of Translation Quality Estimation , 2017, TACL.

[7]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[8]  Haizhou Li,et al.  Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[10]  José A. R. Fonollosa,et al.  Latest trends in hybrid machine translation and its applications , 2015, Comput. Speech Lang..

[11]  André F. T. Martins,et al.  OpenKiwi: An Open Source Framework for Quality Estimation , 2019, ACL.

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  Jong-Hyeok Lee,et al.  Recurrent Neural Network based Translation Quality Estimation , 2016, WMT.

[14]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[15]  Bo Li,et al.  Alibaba Submission for WMT18 Quality Estimation Task , 2018, WMT.

[16]  Bonnie J. Dorr,et al.  Augmenting Neural Machine Translation through Round-Trip Training Approach , 2019, Open Comput. Sci..

[17]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[18]  Mark Fishel,et al.  Visualizing Neural Machine Translation Attention and Confidence , 2017, Prague Bull. Math. Linguistics.

[19]  Stefan Riezler,et al.  QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation , 2015, WMT@EMNLP.

[20]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[23]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[24]  Ramón Fernández Astudillo,et al.  Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task , 2016, WMT.

[25]  Thierry Etchegoyhen,et al.  Supervised and Unsupervised Minimalist Quality Estimators: Vicomtech's Participation in the WMT 2018 Quality Estimation Task , 2018, WMT.

[26]  Mark Fishel,et al.  Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings , 2019, WMT.