Gradient Episodic Memory with a Soft Constraint for Continual Learning

Catastrophic forgetting in continual learning is a common destructive phenomenon in gradient-based neural networks that learn sequential tasks, and it is much different from forgetting in humans, who can learn and accumulate knowledge throughout their whole lives. Catastrophic forgetting is the fatal shortcoming of a large decrease in performance on previous tasks when the model is learning a novel task. To alleviate this problem, the model should have the capacity to learn new knowledge and preserve learned knowledge. We propose an average gradient episodic memory (A-GEM) with a soft constraint $\epsilon \in [0, 1]$, which is a balance factor between learning new knowledge and preserving learned knowledge; our method is called gradient episodic memory with a soft constraint $\epsilon$ ($\epsilon$-SOFT-GEM). $\epsilon$-SOFT-GEM outperforms A-GEM and several continual learning benchmarks in a single training epoch; additionally, it has state-of-the-art average accuracy and efficiency for computation and memory, like A-GEM, and provides a better trade-off between the stability of preserving learned knowledge and the plasticity of learning new knowledge.

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[5]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[6]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[7]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[8]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[11]  Yan Liu,et al.  Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[14]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[15]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[16]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[17]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[18]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[19]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[20]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[21]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[22]  Anthony V. Robins,et al.  Catastrophic forgetting in neural networks: the role of rehearsal mechanisms , 1993, Proceedings 1993 The First New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.

[23]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[27]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[28]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.