Dynamic Consolidation for Continual Learning

Abstract Training deep learning models from a stream of nonstationary data is a critical problem to be solved to achieve general artificial intelligence. As a promising solution, the continual learning (CL) technique aims to build intelligent systems that have the plasticity to learn from new information without forgetting the previously obtained knowledge. Unfortunately, existing CL methods face two nontrivial limitations. First, when updating a model with new data, existing CL methods usually constrain the model parameters within the vicinity of the parameters optimized for old data, limiting the exploration ability of the model; second, the important strength of each parameter (used to consolidate the previously learned knowledge) is fixed and thus is suboptimal for the dynamic parameter updates. To address these limitations, we first relax the vicinity constraints with a global definition of the important strength, which allows us to explore the full parameter space. Specifically, we define the important strength as the sensitivity of the global loss function to the model parameters. Moreover, we propose adjusting the important strength adaptively to align it with the dynamic parameter updates. Through extensive experiments on popular data sets, we demonstrate that our proposed method outperforms the strong baselines by up to 24% in terms of average accuracy.

[1]  M. Cord,et al.  DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Vinay P. Namboodiri,et al.  Rectification-based Knowledge Retention for Continual Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Marie-Francine Moens,et al.  Online Continual Learning from Imbalanced Data , 2020, ICML.

[4]  Andreas Krause,et al.  Coresets via Bilevel Optimization for Continual Learning and Streaming , 2020, NeurIPS.

[5]  Soheil Kolouri,et al.  Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations , 2020, ICLR.

[6]  Adrian Popescu,et al.  IL2M: Class Incremental Learning With Dual Memory , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[9]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Taesup Moon,et al.  Uncertainty-based Continual Learning with Adaptive Regularization , 2019, NeurIPS.

[11]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[12]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[13]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[14]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[15]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[16]  Alexandros Karatzoglou,et al.  Overcoming catastrophic forgetting with hard attention to the task , 2018, ICML.

[17]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[18]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[20]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[21]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[22]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[23]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[32]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .