Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients

Artificial Neural Networks (ANNs) are currently being used as function approximators in many state-of-the-art Reinforcement Learning (RL) algorithms. Spiking Neural Networks (SNNs) have been shown to drastically reduce the energy consumption of ANNs by encoding information in sparse temporal binary spike streams, hence emulating the communication mechanism of biological neurons. Due to their low energy consumption, SNNs are considered to be important candidates as co-processors to be implemented in mobile devices. In this work, the use of SNNs as stochastic policies is explored under an energy-efficient first-to-spike action rule, whereby the action taken by the RL agent is determined by the occurrence of the first spike among the output neurons. A policy gradient-based algorithm is derived considering a Generalized Linear Model (GLM) for spiking neurons. Experimental results demonstrate the capability of online trained SNNs as stochastic policies to gracefully trade energy consumption, as measured by the number of spikes, and control performance. Significant gains are shown as compared to the standard approach of converting an offline trained ANN into an SNN.

[1]  Ammar Belatreche,et al.  An online supervised learning method for spiking neural networks with adaptive structure , 2014, Neurocomputing.

[2]  Alex M. Andrew,et al.  Spiking Neuron Models: Single Neurons, Populations, Plasticity , 2003 .

[3]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[4]  David Kappel,et al.  A Dynamic Connectome Supports the Emergence of Stable Computational Function of Neural Circuits through Reward-Based Learning , 2017, eNeuro.

[5]  Kenji Doya,et al.  A Spiking Neural Network Model of Model-Free Reinforcement Learning with High-Dimensional Sensory Input and Perceptual Ambiguity , 2015, PloS one.

[6]  Wenrui Zhang,et al.  Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks , 2018, NeurIPS.

[7]  Wulfram Gerstner,et al.  Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail , 2009, PLoS Comput. Biol..

[8]  Alireza Bagheri,et al.  Training Probabilistic Spiking Neural Networks with First- To-Spike Decoding , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Matthew Cook,et al.  Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[10]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[11]  Walid Saad,et al.  Machine Learning for Wireless Networks with Artificial Intelligence: A Tutorial on Neural Networks , 2017, ArXiv.

[12]  E J Chichilnisky,et al.  Prediction and Decoding of Retinal Ganglion Cell Responses with a Probabilistic Spiking Model , 2005, The Journal of Neuroscience.

[13]  Pinaki Mazumder,et al.  Hardware-Friendly Actor-Critic Reinforcement Learning Through Modulation of Spike-Timing-Dependent Plasticity , 2017, IEEE Transactions on Computers.

[14]  Catherine D. Schuman,et al.  A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.

[15]  Wulfram Gerstner,et al.  Variational Learning for Recurrent Spiking Networks , 2011, NIPS.

[16]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[17]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[18]  Pinaki Mazumder,et al.  Online Supervised Learning for Hardware-Based Multilayer Spiking Neural Networks Through the Modulation of Weight-Dependent Spike-Timing-Dependent Plasticity , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Ursula Challita,et al.  Artificial Neural Networks-Based Machine Learning for Wireless Networks: A Tutorial , 2017, IEEE Communications Surveys & Tutorials.

[20]  Alois Knoll,et al.  End to End Learning of Spiking Neural Network Based on R-STDP for a Lane Keeping Vehicle , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Peng Li,et al.  Biologically inspired reinforcement learning for mobile robot collision avoidance , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[22]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.