Deep Reinforcement Learning with Population-Coded Spiking Neural Network for Continuous Control

The energy-efficient control of mobile robots is crucial as the complexity of their real-world applications increasingly involves high-dimensional observation and action spaces, which cannot be offset by limited on-board resources. An emerging non-Von Neumann model of intelligence, where spiking neural networks (SNNs) are run on neuromorphic processors, is regarded as an energy-efficient and robust alternative to the state-of-the-art real-time robotic controllers for low dimensional control tasks. The challenge now for this new computing paradigm is to scale so that it can keep up with real-world tasks. To do so, SNNs need to overcome the inherent limitations of their training, namely the limited ability of their spiking neurons to represent information and the lack of effective learning algorithms. Here, we propose a population-coded spiking actor network (PopSAN) trained in conjunction with a deep critic network using deep reinforcement learning (DRL). The population coding scheme dramatically increased the representation capacity of the network and the hybrid learning combined the training advantages of deep networks with the energy-efficient inference of spiking networks. To show the general applicability of our approach, we integrated it with a spectrum of both on-policy and off-policy DRL algorithms. We deployed the trained PopSAN on Intel's Loihi neuromorphic chip and benchmarked our method against the mainstream DRL algorithms for continuous control. To allow for a fair comparison among all methods, we validated them on OpenAI gym tasks. Our Loihi-run PopSAN consumed 140 times less energy per inference when compared against the deep actor network on Jetson TX2, and had the same level of performance. Our results support the efficiency of neuromorphic controllers and suggest our hybrid RL as an alternative to deep learning, when both energy-efficiency and robustness are important.

[1]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Christian Tetzlaff,et al.  Robust robotic control on the neuromorphic research chip Loihi , 2020, ArXiv.

[3]  Hananel Hazan,et al.  Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to Atari Breakout game , 2019, Neural Networks.

[4]  Konstantinos P. Michmizos,et al.  A Spiking Neural Network Emulating the Structure of the Oculomotor System Requires No Learning to Control a Biomimetic Robotic Head , 2020, 2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob).

[5]  Chiara Bartolozzi,et al.  An On-chip Spiking Neural Network for Estimation of the Head Pose of the iCub Robot , 2020, Frontiers in neuroscience.

[6]  Robert A. Legenstein,et al.  Long short-term memory and Learning-to-learn in networks of spiking neurons , 2018, NeurIPS.

[7]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Gasper Tkacik,et al.  Optimal population coding by noisy spiking neurons , 2010, Proceedings of the National Academy of Sciences.

[10]  Lei Deng,et al.  Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks , 2017, Front. Neurosci..

[11]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[12]  A. P. Georgopoulos,et al.  Neuronal population coding of movement direction. , 1986, Science.

[13]  Malu Zhang,et al.  Neural Population Coding for Effective Temporal Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[14]  Robert A. Legenstein,et al.  What Can a Neuron Learn with Spike-Timing-Dependent Plasticity? , 2005, Neural Computation.

[15]  Harold Soh,et al.  Event-Driven Visual-Tactile Sensing and Learning for Robots , 2020, Robotics: Science and Systems.

[16]  Kaushik Roy,et al.  RESPARC: A reconfigurable and energy-efficient architecture with Memristive Crossbars for deep Spiking Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Kaushik Roy,et al.  Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation , 2020, ICLR.

[18]  Kaushik Roy,et al.  RMP-SNN: Residual Membrane Potential Neuron for Enabling Deeper High-Accuracy and Low-Latency Spiking Neural Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wulfram Gerstner,et al.  Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..

[20]  Alois Knoll,et al.  End to End Learning of Spiking Neural Network Based on R-STDP for a Lane Keeping Vehicle , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  A. Pouget,et al.  Neural correlations, population coding and computation , 2006, Nature Reviews Neuroscience.

[22]  Yi Yang,et al.  More is Less: A More Complicated Network with Less Inference Complexity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[24]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[25]  Manuela M. Veloso,et al.  Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation , 2017, CoRL.

[26]  Neelesh Kumar,et al.  Reinforcement co-Learning of Deep and Spiking Neural Networks for Energy-Efficient Mapless Navigation with Neuromorphic Hardware , 2020, ArXiv.

[27]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[28]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[29]  Osvaldo Simeone,et al.  Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients , 2018, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[30]  Narciso García,et al.  Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Kostis P. Michmizos,et al.  Spiking Neural Network on Neuromorphic Hardware for Energy-Efficient Unidimensional SLAM , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Sehoon Ha,et al.  Automated Deep Reinforcement Learning Environment for Hardware of a Modular Legged Robot , 2018, 2018 15th International Conference on Ubiquitous Robots (UR).