Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.

[1]  R. H. Richens Interlingual Machine Translation , 1958, Comput. J..

[2]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[3]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[4]  William Smith,et al.  A short floating-point type in C++ , 1994 .

[5]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[6]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[7]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[8]  G. B. Varile Multilingual Speech Processing , 2005 .

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[11]  Mike Schuster,et al.  Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[16]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[17]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[18]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Joost van de Weijer,et al.  Does Multimodality Help Human and Machine for Translation and Image Captioning? , 2016, WMT.

[21]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[22]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[23]  Bo Wang,et al.  SYSTRAN's Pure Neural Machine Translation Systems , 2016, ArXiv.

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Oriol Vinyals,et al.  Multilingual Language Processing From Bytes , 2015, NAACL.

[26]  Mamoru Komachi,et al.  Controlling the Voice of a Sentence in Japanese-to-English Neural Machine Translation , 2016, WAT@COLING.

[27]  Guillaume Lample,et al.  Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning , 2016, NAACL.

[28]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[29]  Wei Xu,et al.  Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.

[30]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[31]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[32]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[33]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[34]  Jason Lee,et al.  Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[35]  Yoshua Bengio,et al.  Multi-way, multilingual neural machine translation , 2017, Comput. Speech Lang..