Pruning-then-Expanding Model for Domain Adaptation of Neural Machine Translation

Domain Adaptation is widely used in practical applications of neural machine translation, which aims to achieve good performance on both general domain and in-domain data. However, the existing methods for domain adaptation usually suffer from catastrophic forgetting, large domain divergence, and model explosion. To address these three problems, we propose a method of “divide and conquer” which is based on the importance of neurons or parameters for the translation model. In this method, we first prune the model and only keep the important neurons or parameters, making them responsible for both general-domain and in-domain translation. Then we further train the pruned model supervised by the original whole model with knowledge distillation. Last we expand the model to the original size and fine-tune the added parameters for the in-domain translation. We conducted experiments on different language pairs and domains and the results show that our method can achieve significant improvements compared with several strong baselines.

[1]  Liang Tian,et al.  UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation , 2014, LREC.

[2]  Yang Feng,et al.  Bridging the Gap between Training and Inference for Neural Machine Translation , 2019, ACL.

[3]  Masaru Kitsuregawa,et al.  Vocabulary Adaptation for Domain Adaptation in Neural Machine Translation , 2020, FINDINGS.

[4]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Yang Liu,et al.  Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination , 2018, EMNLP.

[7]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[8]  Kevin Duh,et al.  Curriculum Learning for Domain Adaptation in Neural Machine Translation , 2019, NAACL.

[9]  Yong Wang,et al.  Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks , 2019, AAAI.

[10]  Xipeng Qiu,et al.  Finding Sparse Structure for Domain Specific Neural Machine Translation , 2020, ArXiv.

[11]  Yang Feng,et al.  Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation , 2020, COLING.

[12]  Georgiana Dinu,et al.  Distilling Multiple Domains for Neural Machine Translation , 2020, EMNLP.

[13]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[14]  Xipeng Qiu,et al.  Finding Sparse Structures for Domain Specific Neural Machine Translation , 2021, AAAI.

[15]  Rico Sennrich,et al.  Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.

[16]  Yonatan Belinkov,et al.  Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[19]  Graham Neubig,et al.  Extreme Adaptation for Personalized Neural Machine Translation , 2018, ACL.

[20]  Josep Maria Crego,et al.  Domain Control for Neural Machine Translation , 2016, RANLP.

[21]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[22]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Dakwale,et al.  Fine-Tuning for Neural Machine Translation with Limited Degradation across In- and Out-of-Domain Data , 2017, MTSUMMIT.

[27]  Yong Wang,et al.  On the Sparsity of Neural Machine Translation Models , 2020, EMNLP.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[30]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[31]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[32]  Quoc V. Le,et al.  Effective Domain Mixing for Neural Machine Translation , 2017, WMT.

[33]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[34]  Huda Khayrallah,et al.  Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation , 2018, NMT@ACL.

[35]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[36]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[37]  Jiebo Luo,et al.  Iterative Dual Domain Adaptation for Neural Machine Translation , 2019, EMNLP.

[38]  Barnabás Póczos,et al.  Efficient Meta Lifelong-Learning with Limited Memory , 2020, EMNLP.

[39]  Chenhui Chu,et al.  An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation , 2017, ArXiv.

[40]  Huda Khayrallah,et al.  Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation , 2019, NAACL.

[41]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[42]  Ankur Bapna,et al.  Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.

[43]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[44]  Christopher D. Manning,et al.  Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.

[45]  Yang Feng,et al.  Improving Domain Adaptation Translation with Domain Invariant and Specific Information , 2019, NAACL.

[46]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[47]  John DeNero,et al.  Compact Personalized Models for Neural Machine Translation , 2018, EMNLP.