Organizing recurrent network dynamics by task-computation to enable continual learning

Biological systems face dynamic environments that require continual learning. It is not well understood how these systems balance the tension between flexibility for learning and robustness for memory of previous behaviors. Continual learning without catastrophic interference also remains a challenging problem in machine learning. Here, we develop a novel learning rule designed to minimize interference between sequentially learned tasks in recurrent networks. Our learning rule preserves network dynamics within activity-defined subspaces used for previously learned tasks. It encourages dynamics associated with new tasks that might otherwise interfere to instead explore orthogonal subspaces, and it allows for reuse of previously established dynamical motifs where possible. Employing a set of tasks used in neuroscience, we demonstrate that our approach successfully eliminates catastrophic interference and offers a substantial improvement over previous continual learning algorithms. Using dynamical systems analysis, we show that networks trained using our approach can reuse similar dynamical structures across similar tasks. This possibility for shared computation allows for faster learning during sequential training. Finally, we identify organizational differences that emerge when training tasks sequentially versus simultaneously.

[1]  Omri Barak,et al.  One Step Back, Two Steps Forward: Interference and Learning in Recurrent Neural Networks , 2018, Neural Computation.

[2]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[3]  Christopher D. Harvey,et al.  Recurrent Network Models of Sequence Generation and Memory , 2016, Neuron.

[4]  Kenneth Ward Church,et al.  Compositional Language Continual Learning , 2019, ICLR.

[5]  Mark M Churchland,et al.  Motor cortex signals for each arm are mixed across hemispheres and neurons yet partitioned within the population response , 2019, eLife.

[6]  John P. Cunningham,et al.  Reorganization between preparatory and movement population responses in motor cortex , 2016, Nature Communications.

[7]  Davide Bacciu,et al.  Continual Learning with Gated Incremental Memories for sequential data processing , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[8]  Karol Hausman,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[9]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[10]  Devika Narain,et al.  Flexible sensorimotor computations through rapid reconfiguration of cortical dynamics , 2018 .

[11]  Andreas S. Tolias,et al.  Generative replay with feedback connections as a general strategy for continual learning , 2018, ArXiv.

[12]  N. Parga,et al.  Dynamic Control of Response Criterion in Premotor Cortex during Perceptual Detection under Temporal Uncertainty , 2015, Neuron.

[13]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[14]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[15]  A. Valente,et al.  Disentangling the roles of dimensionality and cell classes in neural computations , 2019 .

[16]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[17]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[18]  Devika Narain,et al.  A Dynamical Systems Perspective on Flexible Motor Timing , 2018, Trends in Cognitive Sciences.

[19]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[20]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[21]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[22]  Shan Yu,et al.  Continual learning of context-dependent processing in neural networks , 2018, Nature Machine Intelligence.

[23]  M. Sahani,et al.  Cortical control of arm movements: a dynamical systems perspective. , 2013, Annual review of neuroscience.

[24]  Mohammad Emtiyaz Khan,et al.  Practical Deep Learning with Bayesian Principles , 2019, NeurIPS.

[25]  Shirin Enshaeifar,et al.  Continual Learning Using Bayesian Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[27]  Jason Yosinski,et al.  First-Order Preconditioning via Hypergradient Descent , 2019, ArXiv.

[28]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[29]  Yoshua Bengio,et al.  Toward Training Recurrent Neural Networks for Lifelong Learning , 2018, Neural Computation.

[30]  Matthew T. Kaufman,et al.  Supplementary materials for : Cortical activity in the null space : permitting preparation without movement , 2014 .

[31]  David Sussillo,et al.  Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks , 2013, Neural Computation.

[32]  Monson H. Hayes,et al.  Statistical Digital Signal Processing and Modeling , 1996 .

[33]  Francesca Mastrogiuseppe,et al.  Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks , 2017, Neuron.

[34]  David Sussillo,et al.  FixedPointFinder: A Tensorflow toolbox for identifying and characterizing fixed points in recurrent neural networks , 2018, J. Open Source Softw..

[35]  Surya Ganguli,et al.  Universality and individuality in neural dynamics across large populations of recurrent networks , 2019, NeurIPS.

[36]  Xiao-Jing Wang,et al.  Task representations in neural networks trained to perform many cognitive tasks , 2019, Nature Neuroscience.