AFEC: Active Forgetting of Negative Transfer in Continual Learning

Continual learning aims to learn a sequence of tasks from dynamic data distributions. Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative. If the old knowledge interferes with the learning of a new task, i.e., the forward knowledge transfer is negative, then precisely remembering the old tasks will further aggravate the interference, thus decreasing the performance of continual learning. By contrast, biological neural networks can actively forget the old knowledge that conflicts with the learning of a new experience, through regulating the learning-triggered synaptic expansion and synaptic convergence. Inspired by the biological active forgetting, we propose to actively forget the old knowledge that limits the learning of new tasks to benefit continual learning. Under the framework of Bayesian continual learning, we develop a novel approach named Active Forgetting with synaptic Expansion-Convergence (AFEC). Our method dynamically expands parameters to learn each new task and then selectively combines them, which is formally consistent with the underlying mechanism of biological active forgetting. We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hang Su,et al.  Triple-Memory Networks: A Brain-Inspired Method for Continual Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Xinyang Chen,et al.  Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning , 2019, NeurIPS.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[8]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Binyan Lu,et al.  Forgetting Is Regulated through Rac Activity in Drosophila , 2010, Cell.

[11]  Meta-Aggregating Networks for Class-Incremental Learning , 2020, ArXiv.

[12]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[13]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Matthieu Cord,et al.  PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning , 2020, ECCV.

[15]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[16]  Xuming He,et al.  DER: Dynamically Expandable Representation for Class Incremental Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[18]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[19]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[20]  David Filliat,et al.  Don't forget, there is more than forgetting: new metrics for Continual Learning , 2018, ArXiv.

[21]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[22]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23]  Dongrui Wu,et al.  Overcoming Negative Transfer: A Survey , 2020, ArXiv.

[24]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zhenguo Li,et al.  ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning , 2021, ArXiv.

[28]  Jing He,et al.  Inability to activate Rac1-dependent forgetting contributes to behavioral inflexibility in mutants of multiple autism-risk genes , 2016, Proceedings of the National Academy of Sciences.

[29]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[30]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[31]  Taesup Moon,et al.  Continual Learning with Node-Importance based Adaptive Group Sparse Regularization , 2020, NeurIPS.

[32]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.