Importance-based Neuron Allocation for Multilingual Neural Machine Translation

Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages. However, the current multilingual translation paradigm often makes the model tend to preserve the general knowledge, but ignore the language-specific knowledge. Some previous works try to solve this problem by adding various kinds of language-specific modules to the model, but they suffer from the parameter explosion problem and require specialized manual design. To solve these problems, we propose to divide the model neurons into general and language-specific parts based on their importance across languages. The general part is responsible for preserving the general knowledge and participating in the translation of all the languages, while the language-specific part is responsible for preserving the languagespecific knowledge and participating in the translation of some specific languages. Experimental results on several language pairs, covering IWSLT and Europarl corpus datasets, demonstrate the effectiveness and universality of the proposed method.

[1]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[2]  José A. R. Fonollosa,et al.  Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders , 2020, EACL.

[3]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Tao Qin,et al.  Multilingual Neural Machine Translation with Language Clustering , 2019, EMNLP.

[5]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[6]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[7]  Ankur Bapna,et al.  Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.

[8]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[9]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[10]  Christopher D. Manning,et al.  Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.

[11]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[12]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Miguel Ballesteros,et al.  Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[19]  Ankur Bapna,et al.  Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation , 2021, ICLR.

[20]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[21]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[22]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[23]  Graham Neubig,et al.  Parameter Sharing Methods for Multilingual Self-Attentional Translation Models , 2018, WMT.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Shuhao Gu,et al.  Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation , 2021, NAACL.

[28]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[29]  Yong Wang,et al.  On the Sparsity of Neural Machine Translation Models , 2020, EMNLP.

[30]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[31]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[32]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[33]  Yong Wang,et al.  Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks , 2019, AAAI.

[34]  Jörg Tiedemann,et al.  Multilingual NMT with a Language-Independent Attention Bridge , 2018, RepL4NLP@ACL.