论文信息 - Levenshtein Training for Word-level Quality Estimation

Levenshtein Training for Word-level Quality Estimation

We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation. A Levenshtein Transformer is a natural fit for this task: trained to perform decoding in an iterative manner, a Levenshtein Transformer can learn to post-edit without explicit supervision. To further minimize the mismatch between the translation task and the word-level QE task, we propose a two-stage transfer learning procedure on both augmented data and human postediting data. We also propose heuristics to construct reference labels that are compatible with subword-level finetuning and inference. Results on WMT 2020 QE shared task dataset show that our proposed method has superior data efficiency under the data-constrained setting and competitive performance under the unconstrained setting.

[1] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[2] Fabio Kepler,et al. IST-Unbabel Participation in the WMT20 Quality Estimation Shared Task , 2020, WMT@EMNLP.

[3] Changhan Wang,et al. Levenshtein Transformer , 2019, NeurIPS.

[4] Myle Ott,et al. Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[5] Lucia Specia,et al. Findings of the WMT 2020 Shared Task on Quality Estimation , 2020, WMT.

[6] Matt Post,et al. Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing , 2020, EMNLP.

[7] André F. T. Martins,et al. OpenKiwi: An Open Source Framework for Quality Estimation , 2019, ACL.

[8] Shiliang Sun,et al. HW-TSC's Participation in the WMT 2020 News Translation Shared Task , 2020, WMT@EMNLP.

[9] Dongjun Lee,et al. Two-Phase Cross-Lingual Language Model Fine-Tuning for Machine Translation Quality Estimation , 2020, WMT.

[10] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[11] Jingbo Zhu,et al. The NiuTrans System for the WMT20 Quality Estimation Shared Task , 2020, WMT@EMNLP.

[12] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[13] Shiliang Sun,et al. HW-TSC’s Participation at WMT 2020 Quality Estimation Shared Task , 2020, WMT.

[14] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[15] Alon Lavie,et al. COMET: A Neural Framework for MT Evaluation , 2020, EMNLP.

[16] Matt Post,et al. Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity , 2020, WMT.