论文信息 - Exploring Monolingual Data for Neural Machine Translation with Knowledge Distillation

Exploring Monolingual Data for Neural Machine Translation with Knowledge Distillation

We explore two types of monolingual data that can be included in knowledge distillation training for neural machine translation (NMT). The first is the source-side monolingual data. Second, is the target-side monolingual data that is used as back-translation data. Both datasets are (forward-)translated by a teacher model from source-language to target-language, which are then combined into a dataset for smaller student models. We find that source-side monolingual data improves model performance when evaluated by test-set originated from source-side. Likewise, targetside data has a positive effect on the test-set in the opposite direction. We also show that it is not required to train the student model with the same data used by the teacher, as long as the domains are the same. Finally, we find that combining source-side and target-side yields in better performance than relying on just one side of the monolingual data.

Kenneth Heafield | Alham Fikri Aji

[1] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[2] Vinod Ganapathy,et al. A framework for the extraction of Deep Neural Networks by leveraging public data , 2019, ArXiv.

[3] Rico Sennrich,et al. Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation , 2019, ArXiv.

[4] Myle Ott,et al. On The Evaluation of Machine Translation SystemsTrained With Back-Translation , 2019, ACL.

[5] Kenneth Heafield,et al. Compressing Neural Machine Translation Models with 4-bit Precision , 2020, NGT@ACL.

[6] Markus Freitag,et al. Ensemble Distillation for Neural Machine Translation , 2017, ArXiv.

[7] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.

[8] Christof Monz,et al. Optimizing Transformer for Low-Resource Neural Machine Translation , 2020, COLING.

[9] Moshe Koppel,et al. Translationese and Its Dialects , 2011, ACL.

[10] Dawn Song,et al. Imitation Attacks and Defenses for Black-box Machine Translation Systems , 2020, EMNLP.

[11] Marcin Junczys-Dowmunt,et al. Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation , 2019, WMT.