Adaptive Group Sparse Regularization for Continual Learning

We propose a novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL), using two group sparsity-based penalties. Our method selectively employs the two penalties when learning each node based its the importance, which is adaptively updated after learning each new task. By utilizing the proximal gradient descent method for learning, the exact sparsity and freezing of the model is guaranteed, and thus, the learner can explicitly control the model capacity as the learning continues. Furthermore, as a critical detail, we re-initialize the weights associated with unimportant nodes after learning each task in order to prevent the negative transfer that causes the catastrophic forgetting and facilitate efficient learning of new tasks. Throughout the extensive experimental results, we show that our AGS-CL uses much less additional memory space for storing the regularization parameters, and it significantly outperforms several stateof-the-art baselines on representative continual learning benchmarks for both supervised and reinforcement learning tasks.

[1]  Joshua B. Tenenbaum,et al.  One shot learning of simple visual concepts , 2011, CogSci.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[5]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[6]  S. Grossberg,et al.  ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.

[7]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  Kyunghyun Cho,et al.  Continual Learning via Neural Pruning , 2019, ArXiv.

[10]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[11]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[12]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[15]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[16]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[17]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[18]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[21]  Taesup Moon,et al.  Uncertainty-based Continual Learning with Adaptive Regularization , 2019, NeurIPS.

[22]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[23]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[24]  Martial Mermillod,et al.  The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects , 2013, Front. Psychol..

[25]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[26]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[27]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[29]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[30]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[31]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[33]  Sung Ju Hwang,et al.  Combined Group and Exclusive Sparsity for Deep Neural Networks , 2017, ICML.

[34]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).