Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios

Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In this paper, we first define and analyze the unbalanced training data scenario for UNMT. Based on this scenario, we propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance in this case. Experimental results on several language pairs show that the proposed methods substantially outperform conventional UNMT systems.

[1]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[2]  Xin Wang,et al.  Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation , 2019, NAACL.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[5]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[6]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[7]  Wei Chen,et al.  Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[8]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[9]  Jiajun Shen,et al.  Revisiting Self-Training for Neural Sequence Generation , 2020, ICLR.

[10]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[12]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[13]  Tiejun Zhao,et al.  Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation , 2020, ACL.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[16]  Tiejun Zhao,et al.  Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[18]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[19]  Tiejun Zhao,et al.  Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation , 2019, ACL.

[20]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.