Unsupervised Neural Machine Translation for Similar and Distant Language Pairs

Unsupervised neural machine translation (UNMT) has achieved remarkable results for several language pairs, such as French–English and German–English. Most previous studies have focused on modeling UNMT systems; few studies have investigated the effect of UNMT on specific languages. In this article, we first empirically investigate UNMT for four diverse language pairs (French/German/Chinese/Japanese–English). We confirm that the performance of UNMT in translation tasks for similar language pairs (French/German–English) is dramatically better than for distant language pairs (Chinese/Japanese–English). We empirically show that the lack of shared words and different word orderings are the main reasons that lead UNMT to underperform in Chinese/Japanese–English. Based on these findings, we propose several methods, including artificial shared words and pre-ordering, to improve the performance of UNMT for distant language pairs. Moreover, we propose a simple general method to improve translation performance for all these four language pairs. The existing UNMT model can generate a translation of a reasonable quality after a few training epochs owing to a denoising mechanism and shared latent representations. However, learning shared latent representations restricts the performance of translation in both directions, particularly for distant language pairs, while denoising dramatically delays convergence by continuously modifying the training data. To avoid these problems, we propose a simple, yet effective and efficient, approach that (like UNMT) relies solely on monolingual corpora: pseudo-data-based unsupervised neural machine translation. Experimental results for these four language pairs show that our proposed methods significantly outperform UNMT baselines.

[1]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[2]  Masao Utiyama,et al.  Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation , 2013, ACM Trans. Asian Lang. Inf. Process..

[3]  Xin Wang,et al.  Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation , 2019, NAACL.

[4]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[5]  Tiejun Zhao,et al.  Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation , 2019, ACL.

[6]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[7]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[8]  Wei Chen,et al.  Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[9]  Masao Utiyama,et al.  NICT's Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task , 2019, WMT.

[10]  Xu Tan,et al.  Unsupervised Pivot Translation for Distant Languages , 2019, ACL.

[11]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[12]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[13]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[14]  Andy Way,et al.  Pre-Reordering for Neural Machine Translation: Helpful or Harmful? , 2017, Prague Bull. Math. Linguistics.

[15]  Eneko Agirre,et al.  An Effective Approach to Unsupervised Machine Translation , 2019, ACL.

[16]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[17]  Tie-Yan Liu,et al.  Polygon-Net: A General Framework for Jointly Boosting Multiple Unsupervised Neural Machine Translation Models , 2019, IJCAI.

[18]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[19]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[20]  Ming Zhou,et al.  Unsupervised Neural Machine Translation with SMT as Posterior Regularization , 2019, AAAI.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[25]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[26]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[27]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[28]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[29]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[30]  Gholamreza Haffari,et al.  Iterative Back-Translation for Neural Machine Translation , 2018, NMT@ACL.

[31]  Chenhui Chu,et al.  Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation , 2018, ACL.

[32]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[33]  Enhong Chen,et al.  Joint Training for Neural Machine Translation Models with Monolingual Data , 2018, AAAI.

[34]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[35]  Eneko Agirre,et al.  Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations , 2018, AAAI.

[36]  Atsushi Fujita,et al.  Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation , 2018, ArXiv.

[37]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[38]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.