Overcoming catastrophic forgetting problem by weight consolidation and long-term memory

Sequential learning of multiple tasks in artificial neural networks using gradient descent leads to catastrophic forgetting, whereby previously learned knowledge is erased during learning of new, disjoint knowledge. Here, we propose a new approach to sequential learning which leverages the recent discovery of adversarial examples. We use adversarial subspaces from previous tasks to enable learning of new tasks with less interference. We apply our method to sequentially learning to classify digits 0, 1, 2 (task 1), 4, 5, 6, (task 2), and 7, 8, 9 (task 3) in MNIST (disjoint MNIST task). We compare and combine our Adversarial Direction (AD) method with the recently proposed Elastic Weight Consolidation (EWC) method for sequential learning. We train each task for 20 epochs, which yields good initial performance (99.24% correct task 1 performance). After training task 2, and then task 3, both plain gradient descent (PGD) and EWC largely forget task 1 (task 1 accuracy 32.95% for PGD and 41.02% for EWC), while our combined approach (AD+EWC) still achieves 94.53% correct on task 1. We obtain similar results with a much more difficult disjoint CIFAR10 task, which to our knowledge had not been attempted before (70.10% initial task 1 performance, 67.73% after learning tasks 2 and 3 for AD+EWC, while PGD and EWC both fall to chance level). Our results suggest that AD+EWC can provide better sequential learning performance than either PGD or EWC.

[1]  W. Gan,et al.  Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity , 2015, Nature.

[2]  W. Gan,et al.  Stably maintained dendritic spines are associated with lifelong memories , 2009, Nature.

[3]  Razvan Pascanu,et al.  Memory-based Parameter Adaptation , 2018, ICLR.

[4]  C. Stark,et al.  Pattern Separation in the Human Hippocampal CA3 and Dentate Gyrus , 2008, Science.

[5]  Amanda L. Loshbaugh,et al.  Labelling and optical erasure of synaptic memory traces in the motor cortex , 2015, Nature.

[6]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[7]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[8]  P. Frankland,et al.  The organization of recent and remote memories , 2005, Nature Reviews Neuroscience.

[9]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  L. Squire,et al.  Retrograde amnesia and memory consolidation: a neurobiological perspective , 1995, Current Opinion in Neurobiology.

[11]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[12]  W. Gan,et al.  Sleep promotes branch-specific formation of dendritic spines after learning , 2014, Science.

[13]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[14]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[15]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[16]  Edith Lesburguères,et al.  Early Tagging of Cortical Networks Is Required for the Formation of Enduring Associative Memory , 2011, Science.

[17]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.