论文信息 - Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models

Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models

In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspectives of knowledge transfer, context sensitivity, and function space projection. As a preliminary attempt to alleviate the forgetting problem, we propose an intuitive finetuning strategy named “mix-review”. We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.

Myle Ott | Myle Ott

[1] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[2] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.

[3] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[4] Marc'Aurelio Ranzato,et al. Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[5] Yang Feng,et al. Knowledge Diffusion for Neural Dialogue Generation , 2018, ACL.

[6] Cristian Danescu-Niculescu-Mizil,et al. Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.

[7] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[8] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[9] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11] Xiaoyan Zhu,et al. Commonsense Knowledge Aware Conversation Generation with Graph Attention , 2018, IJCAI.