The Missing Ingredient in Zero-Shot Neural Machine Translation

Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.

[1]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[2]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Tom M. Mitchell,et al.  Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[4]  Jan Niehues,et al.  Effective Strategies in Zero-Shot Neural Machine Translation , 2017, IWSLT.

[5]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[12]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[13]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[14]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[15]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[16]  Tara N. Sainath,et al.  Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.

[17]  Guillaume Wenzek,et al.  Trans-gram, Fast Cross-lingual Word-embeddings , 2015, EMNLP.

[18]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[19]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[20]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Wenhu Chen,et al.  Triangular Architecture for Rare Language Translation , 2018, ACL.

[23]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[24]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[25]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[26]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[27]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[28]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[29]  Wei Chen,et al.  Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[30]  Yang Liu,et al.  A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.

[31]  Miguel Ballesteros,et al.  Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[32]  Thomas Hofmann,et al.  Zero-Shot Dual Machine Translation , 2018, ArXiv.

[33]  Nizar Habash,et al.  Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic , 2013, NAACL.

[34]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[35]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[36]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[37]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[38]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[39]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[40]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.