Gradient Projection Memory for Continual Learning

The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks. We find the bases of these subspaces by analyzing network representations (activations) after learning each task with Singular Value Decomposition (SVD) in a single shot manner and store them in the memory as Gradient Projection Memory (GPM). With qualitative and quantitative analyses, we show that such orthogonal gradient descent induces minimum to no interference with the past tasks, thereby mitigates forgetting. We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches1.

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[3]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[4]  Richard Socher,et al.  Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting , 2019, ICML.

[5]  Fahad Shahbaz Khan,et al.  Random Path Selection for Continual Learning , 2019, NeurIPS.

[6]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[7]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[8]  Kaushik Roy,et al.  Incremental Learning in Deep Convolutional Neural Networks Using Partial Network Sharing , 2017, IEEE Access.

[9]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Yan Liu,et al.  Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.

[12]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[13]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kaushik Roy,et al.  SPACE: Structured Compression and Sharing of Representational Space for Continual Learning , 2020, IEEE Access.

[15]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[16]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[17]  Kaushik Roy,et al.  A Low Effort Approach to Structured CNN Design Using PCA , 2018, IEEE Access.

[18]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Trevor Darrell,et al.  Adversarial Continual Learning , 2020, ECCV.

[20]  Ruiqin Xiong,et al.  Frequency-Domain Dynamic Pruning for Convolutional Neural Networks , 2018, NeurIPS.

[21]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Mingrui Liu,et al.  Improved Schemes for Episodic Memory-based Lifelong Learning , 2020, NeurIPS.

[23]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[24]  Gaurav Kumar,et al.  MATHEMATICS FOR MACHINE LEARNING , 2020, Journal of Mathematical Sciences & Computational Mathematics.

[25]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[26]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[27]  Shan Yu,et al.  Continual learning of context-dependent processing in neural networks , 2018, Nature Machine Intelligence.

[28]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[29]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[30]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[31]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[32]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[33]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[34]  Masashi Sugiyama,et al.  Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent , 2020, ArXiv.

[35]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[36]  Trevor Darrell,et al.  Uncertainty-guided Continual Learning with Bayesian Neural Networks , 2019, ICLR.

[37]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[38]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[39]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[40]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[41]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .