Continual Learning in Task-Oriented Dialogue Systems

Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end. Moreover, we implement and compare multiple existing continual learning baselines, and we propose a simple yet effective architectural method based on residual adapters. Our experiments demonstrate that the proposed architectural method and a simple replay-based strategy perform comparably well but they both achieve inferior performance to the multi-task learning baseline, in where all the data are shown at once, showing that continual learning in task-oriented dialogue systems is a challenging task. Furthermore, we reveal several trade-off between different continual learning methods in term of parameter usage and memory size, which are important in the design of a task-oriented dialogue system. The proposed benchmark is released together with several baselines to promote more research in this direction.

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[5]  Yung-Sung Chuang,et al.  Lifelong Language Knowledge Distillation , 2020, EMNLP.

[6]  Bing Liu,et al.  Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models , 2019, ArXiv.

[7]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[8]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[9]  Visvanathan Ramesh,et al.  A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning , 2020, ArXiv.

[10]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[13]  Kenneth Ward Church,et al.  Compositional Language Continual Learning , 2019, ICLR.

[14]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[15]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[16]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[17]  Pascale Fung,et al.  BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling , 2021, NeurIPS Datasets and Benchmarks.

[18]  Jason Weston,et al.  Deploying Lifelong Open-Domain Dialogue Learning , 2020, ArXiv.

[19]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[20]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[21]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[22]  Dianhai Yu,et al.  How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning , 2019, ArXiv.

[23]  Bill Byrne,et al.  Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.

[24]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[25]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[26]  Richard Socher,et al.  Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[27]  V. Koltun,et al.  Drinking From a Firehose: Continual Learning With Web-Scale Natural Language , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Pascale Fung,et al.  Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning , 2020, EMNLP.

[29]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[30]  Minlie Huang,et al.  Continual Learning for Natural Language Generation in Task-oriented Dialog Systems , 2020, FINDINGS.

[31]  Fan-Keng Sun,et al.  LAMOL: LAnguage MOdeling for Lifelong Language Learning , 2020, ICLR.

[32]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[33]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[34]  Jianfeng Gao,et al.  SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model , 2020, ArXiv.

[35]  Bing Liu,et al.  Lifelong Knowledge Learning in Rule-based Dialogue Systems , 2020, ArXiv.

[36]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[37]  Mihir Kale,et al.  Few-Shot Natural Language Generation by Rewriting Templates , 2020, ArXiv.

[38]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[39]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[40]  Bill Byrne,et al.  TicketTalk: Toward human-level performance with end-to-end, transaction-based dialog systems , 2020, ACL.

[41]  Razvan Pascanu,et al.  Memory-based Parameter Adaptation , 2018, ICLR.

[42]  Sungjin Lee,et al.  Toward Continual Learning for Conversational Agents , 2017, ArXiv.

[43]  Ali Farhadi,et al.  Supermasks in Superposition , 2020, NeurIPS.

[44]  Tianlin Liu,et al.  Continual Learning for Sentence Representations Using Conceptors , 2019, NAACL.

[45]  Maosong Sun,et al.  Continual Relation Learning via Episodic Memory Activation and Reconsolidation , 2020, ACL.

[46]  Richard Socher,et al.  A Simple Language Model for Task-Oriented Dialogue , 2020, NeurIPS.

[47]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[48]  Jason Weston,et al.  Generating Interactive Worlds with Text , 2019, AAAI.

[49]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Pascale Fung,et al.  MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems , 2020, EMNLP.