Continual learning: A comparative study on how to defy forgetting in classification tasks

Artificial neural networks thrive in solving the classification problem for a particular rigid task, where the network resembles a static entity of knowledge, acquired through generalized learning behaviour from a distinct training phase. However, endeavours to extend this knowledge without targeting the original task usually result in a catastrophic forgetting of this task. Continual learning shifts this paradigm towards a network that can continually accumulate knowledge over different tasks without the need for retraining from scratch, with methods in particular aiming to alleviate forgetting. We focus on task-incremental classification, where tasks arrive in a batch-like fashion, and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 10 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize which method performs best, both on balanced Tiny Imagenet and a large-scale unbalanced iNaturalist datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time and storage.

[1]  R. French,et al.  Modeling time perception in rats: Evidence for catastrophic interference in animal learning , 2020, Proceedings of the Twenty First Annual Conference of the Cognitive Science Society.

[2]  Larry P. Heck,et al.  Class-incremental Learning via Deep Model Consolidation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Alexandros Kalousis,et al.  Lifelong Generative Modeling , 2017, Neurocomputing.

[4]  John K. Tsotsos,et al.  Incremental Learning Through Deep Adaptation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[6]  David Filliat,et al.  Continual Learning for Robotics , 2019, Inf. Fusion.

[7]  Benedikt Pfülb,et al.  A comprehensive, application-oriented study of catastrophic forgetting in DNNs , 2019, ICLR.

[8]  Yoshua Bengio,et al.  Online continual learning with no task boundaries , 2019, ArXiv.

[9]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[10]  Tinne Tuytelaars,et al.  Task-Free Continual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[12]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[13]  Marcus Rohrbach,et al.  Selfless Sequential Learning , 2018, ICLR.

[14]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[15]  R. French Dynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[16]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[17]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[18]  Alexandros Kalousis,et al.  Continual Classification Learning Using Generative Models , 2018, NIPS 2018.

[19]  Philip S. Yu,et al.  Learning to Accept New Classes without Training , 2018, ArXiv.

[20]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[21]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[22]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[23]  Clement Chung,et al.  Implementation of an integrated computerized prescriber order‐entry system for chemotherapy in a multisite safety‐net health system , 2018, American journal of health-system pharmacy : AJHP : official journal of the American Society of Health-System Pharmacists.

[24]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[25]  Brendan McCane,et al.  Pseudo-Recursal: Solving the Catastrophic Forgetting Problem in Deep Neural Networks , 2018, ArXiv.

[26]  Joost van de Weijer,et al.  Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[27]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[28]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[29]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[30]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[31]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[33]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[34]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Cordelia Schmid,et al.  Incremental Learning of Object Detectors without Catastrophic Forgetting , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[37]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[38]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[39]  Matthew B. Blaschko,et al.  Encoder Based Lifelong Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[41]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[42]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[43]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[44]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Gabriela Csurka,et al.  Domain Adaptation in Computer Vision Applications , 2017, Advances in Computer Vision and Pattern Recognition.

[47]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[48]  Junmo Kim,et al.  Less-forgetting Learning in Deep Neural Networks , 2016, ArXiv.

[49]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[50]  Alexander Gepperth,et al.  A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems , 2016, Cognitive Computation.

[51]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[52]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[53]  Terrance E. Boult,et al.  Towards Open World Recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[55]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[56]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[57]  Christoph H. Lampert,et al.  A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[58]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[59]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[60]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[61]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[62]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[63]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[65]  L’oubli catastrophique it,et al.  Avoiding catastrophic forgetting by coupling two reverberating neural networks , 2004 .

[66]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[67]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[68]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[69]  Robert M. French,et al.  Pseudo-recurrent Connectionist Networks: An Approach to the 'Sensitivity-Stability' Dilemma , 1997, Connect. Sci..

[70]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[71]  John K. Kruschke,et al.  Human Category Learning: Implications for Backpropagation Models , 1993 .

[72]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[73]  Robert M. French,et al.  Semi-distributed Representations and Catastrophic Forgetting in Connectionist Networks , 1992 .

[74]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[75]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[76]  S. Grossberg Studies of mind and brain : neural principles of learning, perception, development, cognition, and motor control , 1982 .