Modelling continual learning in humans with Hebbian context gating and exponentially decaying task signals

Humans can learn several tasks in succession with minimal mutual interference but perform more poorly when trained on multiple tasks at once. The opposite is true for standard deep neural networks. Here, we propose novel computational constraints for artificial neural networks, inspired by earlier work on gating in the primate prefrontal cortex, that capture the cost of interleaved training and allow the network to learn two tasks in sequence without forgetting. We augment standard stochastic gradient descent with two algorithmic motifs, so-called “sluggish” task units and a Hebbian training step that strengthens connections between task units and hidden units that encode task-relevant information. We found that the “sluggish” units introduce a switch-cost during training, which biases representations under interleaved training towards a joint representation that ignores the contextual cue, while the Hebbian step promotes the formation of a gating scheme from task units to the hidden layer that produces orthogonal representations which are perfectly guarded against interference. Validating the model on previously published human behavioural data revealed that it matches performance of participants who had been trained on blocked or interleaved curricula, and that these performance differences were driven by misestimation of the true category boundary.

[1]  Xiao-Jing Wang,et al.  Geometry of sequence working memory in macaque prefrontal cortex , 2022, Science.

[2]  Seongmin A. Park,et al.  A Neural Network Model of Continual Learning with Cognitive Control , 2022, CogSci ... Annual Conference of the Cognitive Science Society. Cognitive Science Society (U.S.). Conference.

[3]  Justin L. Gardner,et al.  Texture-like representation of objects in human visual cortex , 2022, bioRxiv.

[4]  Andrew M. Saxe,et al.  Orthogonal representations for robust context-dependent task performance in brains and neural networks , 2022, Neuron.

[5]  Lucas O. Souza,et al.  Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments , 2021, Frontiers in Neurorobotics.

[6]  Takuya Ito,et al.  Multi-task representations in human cortex transform along a sensory-to-motor hierarchy , 2021, bioRxiv.

[7]  Steven C. Pan,et al.  Interleaved practice enhances memory and problem-solving ability in undergraduate physics , 2021, npj Science of Learning.

[8]  Subutai Ahmad,et al.  Going Beyond the Point Neuron: Active Dendrites and Sparse Representations for Continual Learning , 2021, bioRxiv.

[9]  Andrew M. Saxe,et al.  Continual Learning in the Teacher-Student Setup: Impact of Task Similarity , 2021, ICML.

[10]  T. Verguts,et al.  Using top-down modulation to optimally balance shared versus separated task representations , 2021, bioRxiv.

[11]  Matthew F. Panichello,et al.  Shared mechanisms underlie the control of working memory and attention , 2021, Nature.

[12]  Sebastian Musslick,et al.  Rationalizing constraints on the capacity for cognitive control , 2020, Trends in Cognitive Sciences.

[13]  Andrew M. Saxe,et al.  On the Rational Boundedness of Cognitive Control: Shared Versus Separated Representations , 2020 .

[14]  Andrew M. Saxe,et al.  If deep learning is the answer, what is the question? , 2020, Nature Reviews Neuroscience.

[15]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[16]  Philip H. S. Torr,et al.  Continual Learning in Low-rank Orthogonal Subspaces , 2020, NeurIPS.

[17]  Hava T. Siegelmann,et al.  Brain-inspired replay for continual learning with artificial neural networks , 2020, Nature Communications.

[18]  Ethan Dyer,et al.  Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics , 2020, ICLR.

[19]  Benjamin F. Grewe,et al.  Continual learning in recurrent neural networks , 2020, ICLR.

[20]  Michael C. Frank,et al.  Unsupervised neural network models of the ventral visual stream , 2020, Proceedings of the National Academy of Sciences.

[21]  David Badre,et al.  The dimensionality of neural representations for control , 2020, Current Opinion in Behavioral Sciences.

[22]  Hava T. Siegelmann,et al.  A modeling framework for adaptive lifelong learning with transfer and savings through gating in the prefrontal cortex , 2020, Proceedings of the National Academy of Sciences.

[23]  Grace W. Lindsay Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future , 2020, Journal of Cognitive Neuroscience.

[24]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[25]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[26]  Michael J. Frank,et al.  Generalizing to generalize: Humans flexibly switch between compositional and conjunctive structures during reinforcement learning , 2019, bioRxiv.

[27]  Alexandra Libby,et al.  Rotational Dynamics Reduce Interference Between Sensory and Memory Representations , 2019, Nature Neuroscience.

[28]  Madhura R. Joglekar,et al.  Task representations in neural networks trained to perform many cognitive tasks , 2019, Nature Neuroscience.

[29]  Praneeth Namburi,et al.  Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive stimuli , 2018, Nature.

[30]  Christopher Summerfield,et al.  Comparing continual task learning in minds and machines , 2018, Proceedings of the National Academy of Sciences.

[31]  Michael M. Halassa,et al.  Thalamic regulation of switching between cortical representations enables cognitive flexibility , 2018, Nature Neuroscience.

[32]  Shan Yu,et al.  Continual learning of context-dependent processing in neural networks , 2018, Nature Machine Intelligence.

[33]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[34]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[35]  Nicolas Y. Masse,et al.  Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization , 2018, Proceedings of the National Academy of Sciences.

[36]  Alexandros Karatzoglou,et al.  Overcoming catastrophic forgetting with hard attention to the task , 2018, ICML.

[37]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[38]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[39]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[40]  Stefano Fusi,et al.  Computational principles of synaptic memory consolidation , 2016, Nature Neuroscience.

[41]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[42]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[43]  Robert L. Goldstone,et al.  What you learn is more than what you see: what can sequencing effects tell us about inductive category learning? , 2015, Front. Psychol..

[44]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[45]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[46]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[47]  Seth A. Herd,et al.  A neural network model of individual differences in task switching abilities , 2014, Neuropsychologia.

[48]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[49]  Robert L. Goldstone,et al.  Putting category learning in order: Category structure and temporal arrangement affect the benefit of interleaved over blocked study , 2014, Memory & cognition.

[50]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[51]  Mounir Boukadoum,et al.  Mechanisms Gating the Flow of Information in the Cortex: What They Might Look Like and What Their Uses may be , 2010, Front. Comput. Neurosci..

[52]  O. Jensen,et al.  Shaping Functional Architecture by Oscillatory Alpha Activity: Gating by Inhibition , 2010, Front. Hum. Neurosci..

[53]  Jonathan D. Cohen,et al.  Sequential effects: Superstition or rational behavior? , 2008, NIPS.

[54]  K. Johnston,et al.  Top-Down Control-Signal Dynamics in Anterior Cingulate and Prefrontal Cortex Neurons following Task Switching , 2007, Neuron.

[55]  Keiji Tanaka,et al.  Prefrontal Cell Activities Related to Monkeys' Success and Failure in Adapting to Rule Changes in a Wisconsin Card Sorting Test Analog , 2006, The Journal of Neuroscience.

[56]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[57]  Bradley R. Postle,et al.  Delay-period Activity in the Prefrontal Cortex: One Function Is Sensory Gating , 2005, Journal of Cognitive Neuroscience.

[58]  Jonathan D. Cohen,et al.  Prefrontal cortex and flexible cognitive control: rules without symbols. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[59]  K. Berman,et al.  Meta‐analysis of neuroimaging studies of the Wisconsin Card‐Sorting task and component processes , 2005, Human brain mapping.

[60]  S. Monsell Task switching , 2003, Trends in Cognitive Sciences.

[61]  Jonathan D. Cohen,et al.  Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task , 2002, Cognitive, affective & behavioral neuroscience.

[62]  C. Shea,et al.  Principles derived from the study of simple skills do not generalize to complex skill learning , 2002, Psychonomic bulletin & review.

[63]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[64]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[65]  J D Cohen,et al.  A network model of catecholamine effects: gain, signal-to-noise ratio, and behavior. , 1990, Science.

[66]  James L. McClelland,et al.  On the control of automatic processes: a parallel distributed processing account of the Stroop effect. , 1990, Psychological review.

[67]  E. Soetens,et al.  Expectancy or Automatic Facilitation? Separating Sequential Effects in Two-Choice Reaction Time , 1985 .

[68]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[69]  Maneesh Sahani,et al.  Organizing recurrent network dynamics by task-computation to enable continual learning , 2020, NeurIPS.

[70]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[71]  Doug Rohrer,et al.  Interleaved Practice Improves Mathematics Learning. , 2014 .

[72]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[73]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.