Multi-task Learning for Multilingual Neural Machine Translation

While monolingual data has been shown to be useful in improving bilingual neural machine translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual NMT (MNMT) systems is a less explored area. In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data. We conduct extensive empirical studies on MNMT systems with 10 language pairs from WMT datasets. We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages with large margin, achieving significantly better results than the individual bilingual models. We also demonstrate the efficacy of the proposed approach in the zero-shot setup for language pairs without bitext training data. Furthermore, we show the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks; the proposed approach outperforms massive scale models trained on single task.

[1]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[2]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[3]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[4]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[5]  David Chiang,et al.  Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation , 2017, IJCNLP.

[6]  Ankur Bapna,et al.  Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.

[7]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[8]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[9]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[10]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[11]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[12]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[13]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[14]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[15]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[16]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[17]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[18]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[19]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Vishrav Chaudhary,et al.  CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[22]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[23]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[24]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[25]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[26]  Miguel Ballesteros,et al.  Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[27]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[28]  Gholamreza Haffari,et al.  Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach , 2018, NAACL.

[29]  Tie-Yan Liu,et al.  Exploiting Monolingual Data at Scale for Neural Machine Translation , 2019, EMNLP.

[30]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[31]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[32]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[33]  Pushpak Bhattacharyya,et al.  Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders , 2019, ACL.

[34]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[37]  Ankur Bapna,et al.  Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation , 2020, AAAI.

[38]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[39]  A. Gispert,et al.  Catalan-English Statistical Machine Translation without Parallel Corpus : Bridging through Spanish , 2006 .

[40]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[41]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[45]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[46]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[47]  Huda Khayrallah,et al.  Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation , 2019, NAACL.

[48]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[49]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[50]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[51]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[52]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.