Towards Interlingua Neural Machine Translation

Common intermediate language representation in neural machine translation can be used to extend bilingual to multilingual systems by incremental training. In this paper, we propose a new architecture based on introducing an interlingual loss as an additional training objective. By adding and forcing this interlingual loss, we are able to train multiple encoders and decoders for each language, sharing a common intermediate representation. Translation results on the low-resourced tasks (Turkish-English and Kazakh-English tasks, from the popular Workshop on Machine Translation benchmark) show the following BLEU improvements up to 2.8. However, results on a larger dataset (Russian-English and Kazakh-English, from the same baselines) show BLEU loses if the same amount. While our system is only providing improvements for the low-resourced tasks in terms of translation quality, our system is capable of quickly deploying new language pairs without retraining the rest of the system, which may be a game-changer in some situations (i.e. in a disaster crisis where international help is required towards a small region or to develop some translation system for a client). Precisely, what is most relevant from our architecture is that it is capable of: (1) reducing the number of production systems, with respect to the number of languages, from quadratic to linear (2) incrementally adding a new language in the system without retraining languages previously there and (3) allowing for translations from the new language to all the others present in the system

[1]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[2]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[3]  Sameh Alansary Interlingua-based Machine Translation Systems: UNL versus Other Interlinguas , 2014 .

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Fethi Bougares,et al.  LIUM Machine Translation Systems for WMT17 News Translation Task , 2017, WMT.

[6]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[9]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[10]  Matthijs Douze,et al.  Learning Joint Multilingual Sentence Representations with Neural Machine Translation , 2017, Rep4NLP@ACL.

[11]  Yoshua Bengio,et al.  Multi-way, multilingual neural machine translation , 2017, Comput. Speech Lang..

[12]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Kevin Knight,et al.  Multi-Source Neural Translation , 2016, NAACL.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[19]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[20]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[21]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[22]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[23]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[24]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[25]  Josef van Genabith,et al.  An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification , 2017, IEEE Journal of Selected Topics in Signal Processing.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Balaraman Ravindran,et al.  Correlational Neural Networks for Common Representation Learning , 2015 .

[28]  Ankur Bapna,et al.  The Missing Ingredient in Zero-Shot Neural Machine Translation , 2019, ArXiv.

[29]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[30]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[31]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[32]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[33]  Parteek Kumar,et al.  Universal networking language: A framework for emerging NLP applications , 2016, 2016 1st India International Conference on Information Processing (IICIP).

[34]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[35]  Orhan Firat,et al.  Zero-Shot Cross-lingual Classification Using Multilingual Neural Machine Translation , 2018, ArXiv.

[36]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[37]  Desmond Elliott,et al.  Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.

[38]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[39]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.