Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision

Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than full image frames and yield sparse, energy-efficient encodings of scenes, in addition to low latency, high dynamic range, and lack of motion blur. Recent progress in object recognition from event-based sensors has come from conversions of successful deep neural network architectures, which are trained with backpropagation. However, using these approaches for event streams requires a transformation to a synchronous paradigm, which not only loses computational efficiency, but also misses opportunities to extract spatio-temporal features. In this article we propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection, combining a spiking neural network (SNN) backbone for efficient event-based feature extraction, and a subsequent classical analog neural network (ANN) head to solve synchronous classification and detection tasks. This is achieved by combining standard backpropagation with surrogate gradient training to propagate gradients inside the SNN layers. Hybrid SNN-ANNs can be trained without additional conversion steps, and result in highly accurate networks that are substantially more computationally efficient than their ANN counterparts. We demonstrate results on event-based classification and object detection datasets, in which only the architecture of the ANN heads need to be adapted to the tasks, and no conversion of the event-based input is necessary. Since ANNs and SNNs require different hardware paradigms to maximize their efficiency, we envision that SNN backbone and ANN head can be executed on different processing units, and thus analyze the necessary bandwidth to communicate between the two parts. Hybrid networks are promising architectures to further advance machine learning approaches for event-based vision, without having to compromise on efficiency. 1Bosch Center for Artificial Intelligence, 71272 Renningen, Germany 2Bio-Inspired Circuits and Systems (BICS) Lab, Zernike Inst Adv Mat, University of Groningen, Nijenborgh 4, NL-9747 AG Groningen, Netherlands. 3Groningen Cognitive Systems and Materials Center (CogniGron), University of Groningen, Nijenborgh 4, NL-9747 AG Groningen, Netherlands ∗Corresponding author 1 ar X iv :2 11 2. 03 42 3v 1 [ cs .C V ] 6 D ec 2 02 1 Hybrid SNN-ANN: Energy-Efficient Event-Based Vision A. Kugele et al. asynchronous, sparse, no preprocessing SNN synchronous readouts, features aggregate over time class, bounding boxes ANN

[1]  Giacomo Indiveri,et al.  A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses , 2015, Front. Neurosci..

[2]  Kostas Daniilidis,et al.  Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks , 2020, ECCV.

[3]  Jan Köhler,et al.  The streaming rollout of deep networks - towards fully model-parallel execution , 2018, NeurIPS.

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Thomas Serre,et al.  Stable and expressive recurrent vision models , 2020, NeurIPS.

[7]  Surya Ganguli,et al.  SuperSpike: Supervised Learning in Multilayer Spiking Neural Networks , 2017, Neural Computation.

[8]  Davide Scaramuzza,et al.  End-to-End Learning of Representations for Asynchronous Event-Based Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  S. Furber,et al.  Comparison of Artificial and Spiking Neural Networks on Digital Hardware , 2021, Frontiers in Neuroscience.

[10]  Juan Martín Carpio Valadez,et al.  Evolutionary Spiking Neural Networks for Solving Supervised Classification Problems , 2019, Comput. Intell. Neurosci..

[11]  Ryad Benosman,et al.  Event-Based Gesture Recognition With Dynamic Background Suppression Using Smartphone Computational Capabilities , 2018, Frontiers in Neuroscience.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[14]  Davide Scaramuzza,et al.  Dynamic obstacle avoidance for quadrotors with event cameras , 2020, Science Robotics.

[15]  Bernabé Linares-Barranco,et al.  A 128$\,\times$ 128 1.5% Contrast Sensitivity 0.9% FPN 3 µs Latency 4 mW Asynchronous Frame-Free Dynamic Vision Sensor Using Transimpedance Preamplifiers , 2013, IEEE Journal of Solid-State Circuits.

[16]  Kaushik Roy,et al.  DIET-SNN: Direct Input Encoding With Leakage and Threshold Optimization in Deep Spiking Neural Networks , 2020, ArXiv.

[17]  Shih-Chii Liu,et al.  Learning to be efficient: algorithms for training low-latency, low-compute deep spiking neural networks , 2016, SAC.

[18]  Mattias Nilsson,et al.  Synaptic Delays for Insect-Inspired Temporal Feature Detection in Dynamic Neuromorphic Processors , 2019, Frontiers in Neuroscience.

[19]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[20]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[21]  Sumon Kumar Bose,et al.  Is my Neural Network Neuromorphic? Taxonomy, Recent Trends and Future Directions in Neuromorphic Engineering , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[22]  Garrick Orchard,et al.  HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Shih-Chii Liu,et al.  Neuromorphic sensory systems , 2010, Current Opinion in Neurobiology.

[24]  Lei Deng,et al.  Direct Training for Spiking Neural Networks: Faster, Larger, Better , 2018, AAAI.

[25]  Gregory Cohen,et al.  Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[26]  Elisabetta Chicca,et al.  Efficient Processing of Spatio-Temporal Data Streams With Spiking Neural Networks , 2020, Frontiers in Neuroscience.

[27]  D. Lewis,et al.  Horizontal synaptic connections in monkey prefrontal cortex: an in vitro electrophysiological study. , 2000, Cerebral cortex.

[28]  Tobi Delbrück,et al.  The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM , 2016, Int. J. Robotics Res..

[29]  T. Delbruck,et al.  A 128 128 120 dB 15 s Latency Asynchronous Temporal Contrast Vision Sensor , 2006 .

[30]  Jochen Triesch,et al.  Unsupervised Learning of Spatio-Temporal Receptive Fields from an Event-Based Vision Sensor , 2020, International Conference on Artificial Neural Networks.

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32]  Hang Su,et al.  Neuromorphic Visual Odometry System For Intelligent Vehicle Application With Bio-inspired Vision Sensor , 2019, 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[33]  Johannes Schemmel,et al.  A wafer-scale neuromorphic hardware system for large-scale neural modeling , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[34]  Wulfram Gerstner,et al.  Neuronal Dynamics: From Single Neurons To Networks And Models Of Cognition , 2014 .

[35]  Gert Cauwenberghs,et al.  Large-Scale Neuromorphic Spiking Array Processors: A Quest to Mimic the Brain , 2018, Front. Neurosci..

[36]  Davide Scaramuzza,et al.  Video to Events: Recycling Video Datasets for Event Cameras , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Romain Brette,et al.  Brian 2: an intuitive and efficient neural simulator , 2019, bioRxiv.

[38]  Johannes Schemmel,et al.  Surrogate gradients for analog neuromorphic computing , 2020, Proceedings of the National Academy of Sciences.

[39]  Davide Scaramuzza,et al.  Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Tobi Delbrück,et al.  Training Deep Spiking Neural Networks Using Backpropagation , 2016, Front. Neurosci..

[42]  Shih-Chii Liu,et al.  Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification , 2017, Front. Neurosci..

[43]  Andreas Hartel,et al.  Versatile Emulation of Spiking Neural Networks on an Accelerated Neuromorphic Substrate , 2019, International Symposium on Circuits and Systems.

[44]  Hesham Mostafa,et al.  Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks , 2019, IEEE Signal Processing Magazine.

[45]  Etienne Perot,et al.  Learning to Detect Objects with a 1 Megapixel Event Camera , 2020, NeurIPS.

[46]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[47]  Garrick Orchard,et al.  SLAYER: Spike Layer Error Reassignment in Time , 2018, NeurIPS.

[48]  Ryad Benosman,et al.  HATS: Histograms of Averaged Time Surfaces for Robust Event-Based Object Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[50]  Kaushik Roy,et al.  Going Deeper in Spiking Neural Networks: VGG and Residual Architectures , 2018, Front. Neurosci..

[51]  Jim D. Garside,et al.  Overview of the SpiNNaker System Architecture , 2013, IEEE Transactions on Computers.

[52]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[53]  Timothée Masquelier,et al.  STDP-based spiking deep neural networks for object recognition , 2016, Neural Networks.

[54]  Vladlen Koltun,et al.  Events-To-Video: Bringing Modern Computer Vision to Event Cameras , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Daniel Matolin,et al.  A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS , 2011, IEEE Journal of Solid-State Circuits.

[56]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.