论文信息 - Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.

[1] R. H. Richens. Interlingual Machine Translation , 1958, Comput. J..

[2] Harold L. Somers,et al. An introduction to machine translation , 1992 .

[3] Philip Gage,et al. A new algorithm for data compression , 1994 .

[4] William Smith,et al. A short floating-point type in C++ , 1994 .

[5] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[6] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[7] Mark Steedman,et al. Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[8] G. B. Varile. Multilingual Speech Processing , 2005 .

[9] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[10] Nathan Schneider,et al. Association for Computational Linguistics: Human Language Technologies , 2011 .

[11] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[13] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[16] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[17] Dianhai Yu,et al. Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[18] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20] Joost van de Weijer,et al. Does Multimodality Help Human and Machine for Translation and Image Captioning? , 2016, WMT.

[21] Yaser Al-Onaizan,et al. Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[22] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[23] Bo Wang,et al. SYSTRAN's Pure Neural Machine Translation Systems , 2016, ArXiv.