Overcoming catastrophic forgetting in neural networks

Significance Deep neural networks are currently the most successful machine-learning technique for solving a variety of tasks, including language translation, image classification, and image generation. One weakness of such models is that, unlike humans, they are unable to learn multiple tasks sequentially. In this work we propose a practical solution to train such models sequentially by protecting the weights important for previous tasks. This approach, inspired by synaptic consolidation in neuroscience, enables state of the art results on multiple reinforcement learning problems experienced sequentially. The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Until now neural networks have not been capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks that they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on a hand-written digit dataset and by learning several Atari 2600 games sequentially.

[1]  D. Dowson,et al.  The Fréchet distance between multivariate normal distributions , 1982 .

[2]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[3]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[4]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[5]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[6]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[7]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[8]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[9]  Mark B. Ring Child: A First Step Towards Continual Learning , 1998, Learning to Learn.

[10]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[11]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[12]  Nick Chater,et al.  Using Noise to Compute Error Surfaces in Connectionist Networks: A Novel Means of Reducing Catastrophic Forgetting , 2002, Neural Computation.

[13]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[14]  Alexander J. Smola,et al.  Laplace Propagation , 2003, NIPS.

[15]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[16]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[17]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[18]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[19]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[20]  Wulfram Gerstner,et al.  Tag-Trigger-Consolidation: A Model of Early and Late Long-Term-Potentiation and Depression , 2008, PLoS Comput. Biol..

[21]  W. Gan,et al.  Stably maintained dendritic spines are associated with lifelong memories , 2009, Nature.

[22]  P. Dayan,et al.  Synapses with short-term plasticity are optimal estimators of presynaptic membrane potentials , 2010, Nature Neuroscience.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Joel Veness,et al.  Context Tree Switching , 2011, 2012 Data Compression Conference.

[25]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[26]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[27]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[28]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[29]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[30]  W. Gan,et al.  Sleep promotes branch-specific formation of dendritic spines after learning , 2014, Science.

[31]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[32]  Stefano Fusi,et al.  Computational principles of biological memory , 2015, 1507.07580.

[33]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[34]  Peter E. Latham,et al.  Synaptic sampling: A connection between PSP variability and uncertainty explains neurophysiological observations , 2015, 1505.04544.

[35]  Amanda L. Loshbaugh,et al.  Labelling and optical erasure of synaptic memory traces in the motor cortex , 2015, Nature.

[36]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[37]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[38]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[39]  W. Gan,et al.  Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity , 2015, Nature.

[40]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[41]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[42]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[45]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[46]  Joel Veness,et al.  The Forget-me-not Process , 2016, NIPS.

[47]  Stefano Fusi,et al.  Computational principles of synaptic memory consolidation , 2016, Nature Neuroscience.