When and Why is Unsupervised Neural Machine Translation Useless?

This paper studies the practicality of the current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten translation tasks with various data settings, we analyze the conditions under which the unsupervised methods fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source and target monolingual data. Such conditions are common for low-resource language pairs, where unsupervised learning works poorly. In all of our experiments, supervised and semi-supervised baselines with 50k-sentence bilingual data outperform the best unsupervised results. Our analyses pinpoint the limits of the current unsupervised NMT and also suggest immediate research directions.

[1]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[2]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[3]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[6]  Kenneth Heafield,et al.  Zero-Resource Neural Machine Translation with Monolingual Pivot Data , 2019, EMNLP.

[7]  Ming Zhou,et al.  Explicit Cross-lingual Pre-training for Unsupervised Machine Translation , 2019, EMNLP.

[8]  Ndapandula Nakashole,et al.  Characterizing Departures from Linearity in Word Translation , 2018, ACL.

[9]  Eneko Agirre,et al.  An Effective Approach to Unsupervised Machine Translation , 2019, ACL.

[10]  Hermann Ney,et al.  Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages , 2019, EMNLP.

[11]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[12]  Rico Sennrich,et al.  Revisiting Low-Resource Neural Machine Translation: A Case Study , 2019, ACL.

[13]  Daniel Gildea,et al.  Feature-Based Decipherment for Machine Translation , 2018, Computational Linguistics.

[14]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[15]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[16]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[17]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[18]  Pushpak Bhattacharyya,et al.  Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders , 2019, ACL.

[19]  Wei Chen,et al.  Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[24]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[25]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[26]  Ming Zhou,et al.  Unsupervised Neural Machine Translation with SMT as Posterior Regularization , 2019, AAAI.

[27]  Malte Nuhn,et al.  Unsupervised training with applications in natural language processing , 2019 .

[28]  Yaser Al-Onaizan,et al.  Translation with Scarce Bilingual Resources , 2004, Machine Translation.

[29]  Daniel Gildea,et al.  Orthographic Features for Bilingual Lexicon Induction , 2018, ACL.

[30]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[31]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[32]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[33]  Kevin Knight,et al.  Deciphering Foreign Language , 2011, ACL.

[34]  Hermann Ney,et al.  Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies , 2019, ACL.

[35]  Ondrej Bojar,et al.  Trivial Transfer Learning for Low-Resource Neural Machine Translation , 2018, WMT.

[36]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[37]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[38]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[39]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[40]  Graham Neubig,et al.  Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.

[41]  Hermann Ney,et al.  The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018 , 2018, WMT.

[42]  Philipp Koehn,et al.  The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English , 2019, EMNLP.

[43]  Ashish Vaswani,et al.  Beyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation , 2014, EMNLP.

[44]  Hermann Ney,et al.  Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder , 2019, EMNLP.

[45]  Kevin Knight,et al.  Unsupervised Analysis for Decipherment Problems , 2006, ACL.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[48]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[49]  Tiejun Zhao,et al.  Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation , 2019, ACL.

[50]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[51]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[52]  Kevin Knight,et al.  Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation , 2019, ACL.

[53]  Abraham J. Wyner,et al.  Modern Neural Networks Generalize on Small Data Sets , 2018, NeurIPS.

[54]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.