论文信息 - Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation

Neural machine translation (NMT) models usually suffer from catastrophic forgetting during continual training where the models tend to gradually forget previously learned knowledge and swing to fit the newly added data which may have a different distribution, e.g. a different domain. Although many methods have been proposed to solve this problem, we cannot get to know what causes this phenomenon yet. Under the background of domain adaptation, we investigate the cause of catastrophic forgetting from the perspectives of modules and parameters (neurons). The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation. And the investigation on the parameters shows that some parameters are important for both the general-domain and in-domain translation and the great change of them during continual training brings about the performance decline in general-domain. We conduct experiments across different language pairs and domains to ensure the validity and reliability of our findings.

Yang Feng | Shuhao Gu

[1] Yang Feng,et al. Improving Domain Adaptation Translation with Domain Invariant and Specific Information , 2019, NAACL.

[2] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[3] Huda Khayrallah,et al. Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation , 2019, NAACL.

[4] Yang Feng,et al. Bridging the Gap between Training and Inference for Neural Machine Translation , 2019, ACL.

[5] Tuo Zhao,et al. Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing , 2019, ACL.

[6] Yonatan Belinkov,et al. Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[7] John DeNero,et al. Compact Personalized Models for Neural Machine Translation , 2018, EMNLP.

[8] Deniz Yuret,et al. Why Neural Translations are the Right Length , 2016, EMNLP.

[9] Huda Khayrallah,et al. Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation , 2018, NMT@ACL.

[10] Chenhui Chu,et al. An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[11] Rico Sennrich,et al. Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.