MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.

[1]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[2]  Jong-Hyeok Lee,et al.  Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[3]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[4]  Matteo Negri,et al.  Findings of the WMT 2019 Shared Task on Automatic Post-Editing , 2019, WMT.

[5]  Lucia Specia,et al.  Findings of the WMT 2018 Shared Task on Quality Estimation , 2018, WMT.

[6]  Lucia Specia,et al.  Quality Estimation for Machine Translation , 2018, Computational Linguistics.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Holger Schwenk,et al.  WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.

[9]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[10]  Lucia Specia,et al.  Unsupervised Quality Estimation for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[11]  André F. T. Martins,et al.  Findings of the WMT 2019 Shared Tasks on Quality Estimation , 2019, WMT.

[12]  André F. T. Martins,et al.  OpenKiwi: An Open Source Framework for Quality Estimation , 2019, ACL.

[13]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[14]  Mikel L. Forcada,et al.  ParaCrawl: Web-scale parallel corpora for the languages of the EU , 2019, MTSummit.

[15]  Timothy Baldwin,et al.  Continuous Measurement Scales in Human Evaluation of Machine Translation , 2013, LAW@ACL.

[16]  Philipp Koehn,et al.  Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English , 2019, ArXiv.

[17]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[18]  Lucia Specia,et al.  Multi-Hypothesis Machine Translation Evaluation , 2020, ACL.

[19]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.