Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task contexts and learn continuously. Although standard deep learning systems achieve state of the art results on static benchmarks, they often struggle in dynamic scenarios. In these settings, error signals from multiple contexts can interfere with one another, ultimately leading to a phenomenon known as catastrophic forgetting. In this article we investigate biologically inspired architectures as solutions to these problems. Specifically, we show that the biophysical properties of dendrites and local inhibitory systems enable networks to dynamically restrict and route information in a context-specific manner. Our key contributions are as follows. First, we propose a novel artificial neural network architecture that incorporates active dendrites and sparse representations into the standard deep learning framework. Next, we study the performance of this architecture on two separate benchmarks requiring task-based adaptation: Meta-World, a multi-task reinforcement learning environment where a robotic agent must learn to solve a variety of manipulation tasks simultaneously; and a continual learning benchmark in which the model’s prediction task changes throughout training. Analysis on both benchmarks demonstrates the emergence of overlapping but distinct and sparse subnetworks, allowing the system to fluidly learn multiple tasks with minimal forgetting. Our neural implementation marks the first time a single architecture has achieved competitive results in both multi-task and continual learning settings. Our research sheds light on how biological properties of neurons can inform deep learning systems to address dynamic scenarios that are typically impossible for traditional ANNs to solve.

[1]  Santiago Ramón y Cajal,et al.  Neue Darstellung vom histologischen Bau des Centralnervensystems , 1894 .

[2]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[3]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[4]  E. Rosch Cognitive Representations of Semantic Categories. , 1975 .

[5]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[6]  Yaser S. Abu-Mostafa,et al.  On the K-Winners-Take-All Network , 1988, NIPS.

[7]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[8]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[10]  S. Laughlin,et al.  An Energy Budget for Signaling in the Grey Matter of the Brain , 2001, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[11]  L. Swanson,et al.  On the fine structure of the pes Hippocampi major (with plates XIII-XXIII) , 2001, Brain Research Bulletin.

[12]  Bartlett W. Mel,et al.  Pyramidal Neuron as Two-Layer Neural Network , 2003, Neuron.

[13]  T. Harkany,et al.  Pyramidal cell communication within local networks in layer 2/3 of rat neocortex , 2003, The Journal of physiology.

[14]  M. London,et al.  Dendritic computation. , 2005, Annual review of neuroscience.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[17]  Judit K. Makara,et al.  Compartmentalized dendritic plasticity and input feature storage in neurons , 2008, Nature.

[18]  N. Spruston Pyramidal neurons: dendritic structure and synaptic integration , 2008, Nature Reviews Neuroscience.

[19]  M. Häusser,et al.  The single dendritic branch as a fundamental functional unit in the nervous system , 2010, Current Opinion in Neurobiology.

[20]  Wen-Liang L Zhou,et al.  The decade of the dendritic NMDA spike , 2010, Journal of neuroscience research.

[21]  M. Häusser,et al.  Synaptic Integration Gradients in Single Cortical Pyramidal Cell Dendrites , 2011, Neuron.

[22]  Alison L. Barth,et al.  Experimental evidence for sparse firing in the neocortex , 2012, Trends in Neurosciences.

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  J. Schiller,et al.  Active properties of neocortical pyramidal neuron dendrites. , 2013, Annual review of neuroscience.

[25]  Dit-Yan Yeung,et al.  A Regularization Approach to Learning Task Relationships in Multitask Learning , 2014, ACM Trans. Knowl. Discov. Data.

[26]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[27]  W. Gan,et al.  Sleep promotes branch-specific formation of dendritic spines after learning , 2014, Science.

[28]  Bartlett W. Mel,et al.  An Augmented Two-Layer Model Captures Nonlinear Analog Spatial Integration Effects in Pyramidal Neuron Dendrites , 2014, Proceedings of the IEEE.

[29]  A. Clark,et al.  On the functions, mechanisms, and malfunctions of intracortical contextual modulation , 2015, Neuroscience & Biobehavioral Reviews.

[30]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[33]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[34]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Subutai Ahmad,et al.  Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex , 2015, Front. Neural Circuits.

[37]  Subutai Ahmad,et al.  How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites , 2016, ArXiv.

[38]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Tiago Branco,et al.  Active dendritic integration as a mechanism for robust and precise grid cell firing , 2017, Nature Neuroscience.

[40]  Saurabh Kumar,et al.  Learning to Compose Skills , 2017, ArXiv.

[41]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[42]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[43]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[44]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Yuwei Cui,et al.  The HTM Spatial Pooler—A Neocortical Algorithm for Online Sparse Distributed Coding , 2016, bioRxiv.

[46]  Yuwei Cui,et al.  A Theory of How Columns in the Neocortex Enable Learning the Structure of the World , 2017, Front. Neural Circuits.

[47]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[48]  William A. Phillips,et al.  Cognitive functions of intracellular mechanisms for contextual amplification , 2017, Brain and Cognition.

[49]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[50]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[51]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[52]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[53]  Nicolas Y. Masse,et al.  Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization , 2018, Proceedings of the National Academy of Sciences.

[54]  Michael Hines,et al.  Embedded ensemble encoding hypothesis: The role of the “Prepared” cell , 2018, Journal of neuroscience research.

[55]  Bryan J MacLennan,et al.  Functional clustering of dendritic activity during decision-making , 2018, bioRxiv.

[56]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[57]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[58]  Li I. Zhang,et al.  Sparse Representation in Awake Auditory Cortex: Cell-type Dependence, Synaptic Mechanisms, Developmental Emergence, and Modulation. , 2018, Cerebral cortex.

[59]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[60]  Razvan Pascanu,et al.  Adapting Auxiliary Losses Using Gradient Similarity , 2018, ArXiv.

[61]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[62]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Matthew Riemer,et al.  Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[64]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[65]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[66]  Subutai Ahmad,et al.  How Can We Be So Dense? The Benefits of Using Highly Sparse Representations , 2019, ArXiv.

[67]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[68]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Marc'Aurelio Ranzato,et al.  Task-Driven Modular Networks for Zero-Shot Compositional Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Iasonas Kokkinos,et al.  Attentive Single-Tasking of Multiple Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Idan Segev,et al.  Single Cortical Neurons as Deep Artificial Neural Networks , 2019 .

[72]  Peng P Gao,et al.  Local Glutamate-Mediated Dendritic Plateau Potentials Change the State of the Cortical Pyramidal Neuron , 2019, bioRxiv.

[73]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[74]  Sergey Levine,et al.  Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives , 2019, ICLR.

[75]  Ali Farhadi,et al.  Supermasks in Superposition , 2020, NeurIPS.

[76]  Panayiota Poirazi,et al.  Illuminating dendritic function with computational models , 2020, Nature Reviews Neuroscience.

[77]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[78]  Robert Legenstein,et al.  Emergence of Stable Synaptic Clusters on Dendrites Through Synaptic Rewiring , 2020, Frontiers in Computational Neuroscience.

[79]  M. London,et al.  Single Cortical Neurons as Deep Artificial Neural Networks , 2020, SSRN Electronic Journal.

[80]  Bruno A Olshausen,et al.  Selectivity and robustness of sparse coding networks , 2020, Journal of vision.

[81]  M. Larkum,et al.  Active dendritic currents gate descending cortical outputs in perception , 2020, Nature Neuroscience.

[82]  Máté Lengyel,et al.  Contextual inference underlies the learning of sensorimotor repertoires , 2020, Nature.

[83]  Yee Whye Teh,et al.  Multiplicative Interactions and Where to Find Them , 2020, ICLR.

[84]  D. Budden,et al.  Gated Linear Networks , 2019, AAAI.

[85]  Sergey Levine,et al.  How to train your robot with deep reinforcement learning: lessons we have learned , 2021, Int. J. Robotics Res..

[86]  Peter E. Latham,et al.  A rapid and efficient learning rule for biological neural circuits , 2021 .