A dual-memory architecture for reinforcement learning on neuromorphic platforms

Reinforcement learning (RL) is a foundation of learning in biological systems and provides a framework to address numerous challenges with real-world artificial intelligence applications. Efficient implementations of RL techniques could allow for agents deployed in edge-use cases to gain novel abilities, such as improved navigation, understanding complex situations and critical decision making. Toward this goal, we describe a flexible architecture to carry out RL on neuromorphic platforms. This architecture was implemented using an Intel neuromorphic processor and demonstrated solving a variety of tasks using spiking dynamics. Our study proposes a usable solution for real-world RL applications and demonstrates applicability of the neuromorphic platforms for RL problems.

[1]  Julien Dupeyroux A toolbox for neuromorphic sensing in robotics , 2021, ArXiv.

[2]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[3]  Alois Knoll,et al.  Neuromorphic implementations of neurobiological learning algorithms for spiking neural networks , 2015, Neural Networks.

[4]  J. W. Rudy,et al.  The hippocampal indexing theory and episodic memory: Updating the index , 2007, Hippocampus.

[5]  Carver A. Mead,et al.  Neuromorphic electronic systems , 1990, Proc. IEEE.

[6]  ZhangYunqi,et al.  The Architectural Implications of Autonomous Driving , 2018 .

[7]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[10]  Emre Neftci,et al.  Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks , 2019, IEEE Signal Processing Magazine.

[11]  S. Herculano‐Houzel,et al.  The search for true numbers of neurons and glial cells in the human brain: A review of 150 years of cell counting , 2016, The Journal of comparative neurology.

[12]  Peer Neubert,et al.  A comparison of vector symbolic architectures , 2020, Artificial Intelligence Review.

[13]  G. Einevoll,et al.  From grid cells to place cells: A mathematical model , 2006, Hippocampus.

[14]  Lingjia Tang,et al.  The Architectural Implications of Autonomous Driving: Constraints and Acceleration , 2018, ASPLOS.

[15]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[16]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[17]  Christian K. Machens,et al.  Efficient codes and balanced networks , 2016, Nature Neuroscience.

[18]  David C Rowland,et al.  Place cells, grid cells, and memory. , 2015, Cold Spring Harbor perspectives in biology.

[19]  Catherine D. Schuman,et al.  A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.

[20]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[21]  Mingguo Zhao,et al.  A system hierarchy for brain-inspired computing , 2020, Nature.

[22]  Amir Hussain,et al.  Applications of Deep Learning and Reinforcement Learning to Biological Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[23]  P. Haydon Glia: listening and talking to the synapse , 2001, Nature Reviews Neuroscience.

[24]  E. Syková,et al.  Astroglial networks scale synaptic activity and plasticity , 2011, Proceedings of the National Academy of Sciences.

[25]  Peer Neubert,et al.  An Introduction to Hyperdimensional Computing for Robotics , 2019, KI - Künstliche Intelligenz.

[26]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[27]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[28]  Paul W. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis (Proceedings of the National Academy of Sciences of the United States of America (2011) 108, S3, (15647-15654) DOI: 10.1073/pnas.1014269108) , 2011 .

[29]  Timothée Masquelier,et al.  Deep Learning in Spiking Neural Networks , 2018, Neural Networks.

[30]  Johannes Schemmel,et al.  Reward-based learning under hardware constraints—using a RISC processor embedded in a neuromorphic substrate , 2013, Front. Neurosci..

[31]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[32]  R. Reid,et al.  Temporal Coding of Visual Information in the Thalamus , 2000, The Journal of Neuroscience.

[33]  Garrick Orchard,et al.  Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook , 2021, Proceedings of the IEEE.

[34]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[35]  Wolfgang Maass,et al.  Eligibility traces provide a data-inspired alternative to backpropagation through time , 2019 .

[36]  G. Buzsáki The Brain from Inside Out , 2019 .

[37]  Bruno A. Olshausen,et al.  Resonator networks for factoring distributed representations of data structures , 2020, ArXiv.

[38]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[39]  Shih-Chii Liu,et al.  Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification , 2017, Front. Neurosci..

[40]  J. Born,et al.  The memory function of sleep , 2010, Nature Reviews Neuroscience.

[41]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[42]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[43]  S. Furber,et al.  Comparison of Artificial and Spiking Neural Networks on Digital Hardware , 2021, Frontiers in Neuroscience.

[44]  Eric L. Denovellis,et al.  Hippocampal replay of experience at real-world speeds , 2020, bioRxiv.

[45]  Denis Mareschal,et al.  A complementary learning systems approach to temporal difference learning , 2019, Neural Networks.

[46]  Nancy A. Lynch,et al.  Winner-Take-All Computation in Spiking Neural Networks , 2019, ArXiv.