The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task

This paper presents the JHU-Microsoft joint submission for WMT 2021 quality estimation shared task. We only participate in Task 2 (post-editing effort estimation) of the shared task, focusing on the target-side word-level quality estimation. The techniques we experimented with include Levenshtein Transformer training and data augmentation with a combination of forward, backward, round-trip translation, and pseudo post-editing of the MT output. We demonstrate the competitiveness of our system compared to the widely adopted OpenKiwi-XLM baseline. Our system is also the top-ranking system on the MT MCC metric for the English-German language pair.

[1]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[2]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[3]  Dongjun Lee,et al.  Cross-Lingual Transformers for Neural Automatic Post-Editing , 2020, WMT.

[4]  Jaemin Jo,et al.  IntelliCAT: Intelligent Machine Translation Post-Editing with Quality Estimation and Translation Suggestion , 2021, ACL.

[5]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[6]  Jiawei Zhou,et al.  Improving Non-autoregressive Neural Machine Translation with Monolingual Data , 2020, ACL.

[7]  Fabio Kepler,et al.  IST-Unbabel Participation in the WMT20 Quality Estimation Shared Task , 2020, WMT@EMNLP.

[8]  Changhan Wang,et al.  Levenshtein Transformer , 2019, NeurIPS.

[9]  Philipp Koehn,et al.  Levenshtein Training for Word-level Quality Estimation , 2021, EMNLP.

[10]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[11]  Shiliang Sun,et al.  HW-TSC’s Participation at WMT 2020 Quality Estimation Shared Task , 2020, WMT.

[12]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[13]  Dongjun Lee,et al.  Two-Phase Cross-Lingual Language Model Fine-Tuning for Machine Translation Quality Estimation , 2020, WMT.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[16]  André F. T. Martins,et al.  OpenKiwi: An Open Source Framework for Quality Estimation , 2019, ACL.