Demonstrating Advantages of Neuromorphic Computation: A Pilot Study

Neuromorphic devices represent an attempt to mimic aspects of the brain's architecture and dynamics with the aim of replicating its hallmark functional capabilities in terms of computational power, robust learning and energy efficiency. We employ a single-chip prototype of the BrainScaleS 2 neuromorphic system to implement a proof-of-concept demonstration of reward-modulated spike-timing-dependent plasticity in a spiking network that learns to play a simplified version of the Pong video game by smooth pursuit. This system combines an electronic mixed-signal substrate for emulating neuron and synapse dynamics with an embedded digital processor for on-chip learning, which in this work also serves to simulate the virtual environment and learning agent. The analog emulation of neuronal membrane dynamics enables a 1000-fold acceleration with respect to biological real-time, with the entire chip operating on a power budget of 57 mW. Compared to an equivalent simulation using state-of-the-art software, the on-chip emulation is at least one order of magnitude faster and three orders of magnitude more energy-efficient. We demonstrate how on-chip learning can mitigate the effects of fixed-pattern noise, which is unavoidable in analog substrates, while making use of temporal variability for action exploration. Learning compensates imperfections of the physical substrate, as manifested in neuronal parameter variability, by adapting synaptic weights to match respective excitability of individual neurons.

[1]  Johannes Schemmel,et al.  Stochastic inference with spiking neurons in the high-conductance state , 2016, Physical review. E.

[2]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[3]  Robert A. Legenstein,et al.  A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback , 2008, PLoS Comput. Biol..

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Steve B. Furber,et al.  Performance Comparison of the Digital Neuromorphic Hardware SpiNNaker and the Neural Network Simulation Software NEST for a Full-Scale Cortical Microcircuit Model , 2018, Front. Neurosci..

[6]  Steve B. Furber,et al.  The SpiNNaker Project , 2014, Proceedings of the IEEE.

[7]  Y. Niv Reinforcement learning in the brain , 2009 .

[8]  Johannes Schemmel,et al.  A wafer-scale neuromorphic hardware system for large-scale neural modeling , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[9]  E. Fetz,et al.  Operantly conditioned patterns on precentral unit activity and correlated responses in adjacent cells and contralateral muscles. , 1973, Journal of neurophysiology.

[10]  Henning Sprekeler,et al.  Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity , 2010, The Journal of Neuroscience.

[11]  Elke Edelmann,et al.  Dopamine Modulates Spike Timing-Dependent Plasticity and Action Potential Properties in CA1 Pyramidal Neurons of Acute Rat Hippocampal Slices , 2011, Front. Syn. Neurosci..

[12]  N. Guttman,et al.  Operant conditioning, extinction, and periodic reinforcement in relation to concentration of sucrose used as reinforcing agent. , 1953, Journal of experimental psychology.

[13]  Steve B. Furber,et al.  Neuromodulated Synaptic Plasticity on the SpiNNaker Neuromorphic System , 2018, Front. Neurosci..

[14]  J. Kerr,et al.  Dopamine Receptor Activation Is Required for Corticostriatal Spike-Timing-Dependent Plasticity , 2008, The Journal of Neuroscience.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Johannes Schemmel,et al.  An Accelerated LIF Neuronal Network Array for a Large-Scale Mixed-Signal Neuromorphic Architecture , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[17]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[18]  Liz Campbell,et al.  Performance Comparison of the Digital Neuromorphic Hardware SpiNNaker and the Neural Network Simulation Software NEST for a Full-Scale Cortical Microcircuit Model , 2018 .

[19]  Rodrigo Alvarez-Icaza,et al.  Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations , 2014, Proceedings of the IEEE.

[20]  Johannes Schemmel,et al.  Characterization and Compensation of Network-Level Anomalies in Mixed-Signal Neuromorphic Modeling Platforms , 2014, PloS one.

[21]  A. Destexhe,et al.  The high-conductance state of neocortical neurons in vivo , 2003, Nature Reviews Neuroscience.

[22]  Giacomo Indiveri,et al.  A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses , 2015, Front. Neurosci..

[23]  E. Fetz,et al.  Volitional control of single cortical neurons in a brain–machine interface , 2011, Journal of neural engineering.

[24]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[25]  W. Schultz,et al.  Retroactive modulation of spike timing-dependent plasticity by dopamine , 2015, eLife.

[26]  Wulfram Gerstner,et al.  Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail , 2009, PLoS Comput. Biol..

[27]  Timothée Masquelier,et al.  First-Spike-Based Visual Categorization Using Reward-Modulated STDP , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Wolfgang Maass,et al.  Noise as a Resource for Computation and Learning in Networks of Spiking Neurons , 2014, Proceedings of the IEEE.

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  Johannes Schemmel,et al.  Full wafer redistribution and wafer embedding as key technologies for a multi-scale neuromorphic hardware cluster , 2017, 2017 IEEE 19th Electronics Packaging Technology Conference (EPTC).

[31]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[32]  Wolfgang Maass,et al.  Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning. , 2014, Cerebral cortex.

[33]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[34]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[35]  Johannes Schemmel,et al.  Reward-based learning under hardware constraints—using a RISC processor embedded in a neuromorphic substrate , 2013, Front. Neurosci..

[36]  Manish Kumar Large-scale neuromorphic computing systems , 2016 .

[37]  Johannes Schemmel,et al.  An analog dynamic memory array for neuromorphic hardware , 2013, 2013 European Conference on Circuit Theory and Design (ECCTD).

[38]  Johannes Schemmel,et al.  A highly tunable 65-nm CMOS LIF neuron for a large scale neuromorphic system , 2016, ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference.

[39]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Mitsuhisa Sato,et al.  Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers , 2018, Front. Neuroinform..

[41]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[42]  W. Gerstner,et al.  Neuromodulated Spike-Timing-Dependent Plasticity, and Theory of Three-Factor Learning Rules , 2016, Front. Neural Circuits.

[43]  Johannes Schemmel,et al.  Demonstrating Hybrid Learning in a Flexible Neuromorphic Hardware System , 2016, IEEE Transactions on Biomedical Circuits and Systems.

[44]  Simon J. Thorpe,et al.  Combining STDP and Reward-Modulated STDP in Deep Convolutional Spiking Neural Networks for Digit Recognition , 2018, ArXiv.

[45]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[46]  M. Farries,et al.  Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.

[47]  Gert Cauwenberghs,et al.  Neuromorphic Silicon Neuron Circuits , 2011, Front. Neurosci.