A Study of Multilingual Neural Machine Translation

Multilingual neural machine translation (NMT) has recently been investigated from different aspects (e.g., pivot translation, zero-shot translation, fine-tuning, or training from scratch) and in different settings (e.g., rich resource and low resource, one-to-many, and many-to-one translation). This paper concentrates on a deep understanding of multilingual NMT and conducts a comprehensive study on a multilingual dataset with more than 20 languages. Our results show that (1) low-resource language pairs benefit much from multilingual training, while rich-resource language pairs may get hurt under limited model capacity and training with similar languages benefits more than dissimilar languages; (2) fine-tuning performs better than training from scratch in the one-to-many setting while training from scratch performs better in the many-to-one setting; (3) the bottom layers of the encoder and top layers of the decoder capture more language-specific information, and just fine-tuning these parts can achieve good accuracy for low-resource language pairs; (4) direct translation is better than pivot translation when the source language is similar to the target language (e.g., in the same language branch), even when the size of direct training data is much smaller; (5) given a fixed training data budget, it is better to introduce more languages into multilingual training for zero-shot translation.

[1]  Miguel Ballesteros,et al.  Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[2]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[3]  Tao Qin,et al.  Multilingual Neural Machine Translation with Language Clustering , 2019, EMNLP.

[4]  C. Martin 2015 , 2015, Les 25 ans de l’OMC: Une rétrospective en photos.

[5]  Tom M. Mitchell,et al.  Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[8]  Di He,et al.  Dense Information Flow for Neural Machine Translation , 2018, NAACL.

[9]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[10]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Tao Qin,et al.  Language Graph Distillation for Low-Resource Machine Translation , 2019, ArXiv.

[13]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[14]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[15]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[16]  Graham Neubig,et al.  Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Yang Liu,et al.  Joint training for pivot-based neural machine translation , 2017, IJCAI 2017.

[19]  Di He,et al.  Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.

[20]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[21]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[22]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[23]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[24]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[25]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[26]  Tetsuji Nakagawa,et al.  An Empirical Study of Language Relatedness for Transfer Learning in Neural Machine Translation , 2017, PACLIC.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Xu Tan,et al.  Unsupervised Pivot Translation for Distant Languages , 2019, ACL.

[29]  Graham Neubig,et al.  When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.

[30]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[31]  Yong Wang,et al.  Meta-Learning for Low-Resource Neural Machine Translation , 2018, EMNLP.

[32]  Graham Neubig,et al.  Parameter Sharing Methods for Multilingual Self-Attentional Translation Models , 2018, WMT.

[33]  Ammar Alqatari Convolutional Sequence to Sequence Learning to Improve Nanopore Basecalling Efficiency , 2018 .

[34]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35]  Yang Liu,et al.  A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.

[36]  Di He,et al.  Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.

[37]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[38]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[39]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[40]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[41]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.