Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization

Significance Artificial neural networks can suffer from catastrophic forgetting, in which learning a new task causes the network to forget how to perform previous tasks. While previous studies have proposed various methods that can alleviate forgetting over small numbers (⩽10) of tasks, it is uncertain whether they can prevent forgetting across larger numbers of tasks. In this study, we propose a neuroscience-inspired scheme, called “context-dependent gating,” in which mostly nonoverlapping sets of units are active for any one task. Importantly, context-dependent gating has a straightforward implementation, requires little extra computational overhead, and when combined with previous methods to stabilize connection weights, can allow networks to maintain high performance across large numbers of sequentially presented tasks. Humans and most animals can learn new tasks without forgetting old ones. However, training artificial neural networks (ANNs) on new tasks typically causes them to forget previously learned tasks. This phenomenon is the result of “catastrophic forgetting,” in which training an ANN disrupts connection weights that were important for solving previous tasks, degrading task performance. Several recent studies have proposed methods to stabilize connection weights of ANNs that are deemed most important for solving a task, which helps alleviate catastrophic forgetting. Here, drawing inspiration from algorithms that are believed to be implemented in vivo, we propose a complementary method: adding a context-dependent gating signal, such that only sparse, mostly nonoverlapping patterns of units are active for any one task. This method is easy to implement, requires little computational overhead, and allows ANNs to maintain high performance across large numbers of sequentially presented tasks, particularly when combined with weight stabilization. We show that this method works for both feedforward and recurrent network architectures, trained using either supervised or reinforcement-based learning. This suggests that using multiple, complementary methods, akin to what is believed to occur in the brain, can be a highly effective strategy to support continual learning.

[1]  V. Tennyson The Fine Structure of the Nervous System. , 1970 .

[2]  M. Carpenter The Fine Structure of the Nervous System , 1970, Neurology.

[3]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  S. Palay,et al.  The Fine Structure of the Nervous System: Neurons and Their Supporting Cells , 1991 .

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  M. Fischer,et al.  Rapid Actin-Based Plasticity in Dendritic Spines , 1998, Neuron.

[7]  W. Singer,et al.  Dynamic predictions: Oscillations and synchrony in top–down processing , 2001, Nature Reviews Neuroscience.

[8]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[9]  R. Yuste,et al.  Morphological changes in dendritic spines associated with long-term synaptic plasticity. , 2001, Annual review of neuroscience.

[10]  H. Kasai,et al.  Structure–stability–function relationships of dendritic spines , 2003, Trends in Neurosciences.

[11]  K. Johnston,et al.  Top-Down Control-Signal Dynamics in Anterior Cingulate and Prefrontal Cortex Neurons following Task Switching , 2007, Neuron.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  W. Gan,et al.  Stably maintained dendritic spines are associated with lifelong memories , 2009, Nature.

[14]  Willie F. Tobin,et al.  Rapid formation and selective stabilization of synapses for enduring motor memories , 2009, Nature.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  D. Muller,et al.  Dendritic spine formation and stabilization , 2009, Current Opinion in Neurobiology.

[17]  Gonzalo H. Otazu,et al.  Engaging in an auditory task suppresses responses in auditory cortex , 2009, Nature Neuroscience.

[18]  Scott T. Grafton,et al.  Dynamic reconfiguration of human brain networks during learning , 2010, Proceedings of the National Academy of Sciences.

[19]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[20]  G. Fishell,et al.  A disinhibitory circuit mediates motor integration in the somatosensory cortex , 2013, Nature Neuroscience.

[21]  Thomas M. Morse,et al.  Compartmentalization of GABAergic Inhibition by Dendritic Spines , 2013, Science.

[22]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  G. Tononi,et al.  Sleep and the Price of Plasticity: From Synaptic and Cellular Homeostasis to Memory Consolidation and Integration , 2014, Neuron.

[25]  Byron M. Yu,et al.  Neural constraints on learning , 2014, Nature.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  W. Gan,et al.  Branch-specific dendritic Ca2+ spikes cause persistent synaptic plasticity , 2015, Nature.

[28]  Xiao-Jing Wang,et al.  A dendritic disinhibitory circuit mechanism for pathway-specific gating , 2016, Nature Communications.

[29]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[30]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[31]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[32]  Grace W. Lindsay,et al.  Parallel processing by cortical inhibition enables context-dependent behavior , 2016, Nature Neuroscience.

[33]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[34]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[35]  Jeff Clune,et al.  Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks , 2017, PloS one.

[36]  H. Francis Song,et al.  Clustering and compositionality of task representations in a neural network trained to perform many cognitive tasks , 2017, bioRxiv.

[37]  T. Carew,et al.  Memory Takes Time , 2017, Neuron.

[38]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[39]  Xu He,et al.  Overcoming Catastrophic Interference by Conceptors , 2017, ArXiv.

[40]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[42]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[44]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.