Massively Multilingual Neural Machine Translation

Multilingual Neural Machine Translation enables training a single model that supports translation from multiple source languages into multiple target languages. We perform extensive experiments in training massively multilingual NMT models, involving up to 103 distinct languages and 204 translation directions simultaneously. We explore different setups for training such models and analyze the trade-offs between translation quality and various modeling decisions. We report results on the publicly available TED talks multilingual corpus where we show that massively multilingual many-to-many models are effective in low resource settings, outperforming the previous state-of-the-art while supporting up to 59 languages in 116 translation directions in a single model. Our experiments on a large-scale dataset with 103 languages, 204 trained directions and up to one million examples per direction also show promising results, surpassing strong bilingual baselines and encouraging future work on massively multilingual NMT.

[1]  Gholamreza Haffari,et al.  Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation , 2018, ACL.

[2]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[3]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Mauro Cettolo,et al.  Overview of the IWSLT 2017 Evaluation Campaign , 2017, IWSLT.

[6]  Jörg Tiedemann,et al.  Emerging Language Spaces Learned From Massively Multilingual Corpora , 2018, DHN.

[7]  Graham Neubig,et al.  Multilingual Neural Machine Translation With Soft Decoupled Encoding , 2019, ICLR.

[8]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[9]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[10]  Graham Neubig,et al.  Parameter Sharing Methods for Multilingual Self-Attentional Translation Models , 2018, WMT.

[11]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[13]  Graham Neubig,et al.  Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Tom M. Mitchell,et al.  Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[18]  Mauro Cettolo,et al.  A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation , 2018, COLING.

[19]  Orhan Firat,et al.  Zero-Shot Cross-lingual Classification Using Multilingual Neural Machine Translation , 2018, ArXiv.

[20]  Tara N. Sainath,et al.  Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.

[21]  Feifei Zhai,et al.  Three Strategies to Improve One-to-Many Multilingual Translation , 2018, EMNLP.

[22]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[23]  Miguel Ballesteros,et al.  Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[24]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[25]  Graham Neubig,et al.  When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.

[26]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[27]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[28]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[29]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[30]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[31]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[32]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[33]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[34]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[35]  David Chiang,et al.  Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation , 2017, IJCNLP.

[36]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[37]  Graham Neubig,et al.  Learning Language Representations for Typology Prediction , 2017, EMNLP.