Deep representation via convolutional neural network for classification of spatiotemporal event streams

Abstract Different from traditional frame-based cameras, event-based dynamic vision sensor (DVS) converts the visual information into spatiotemporal event streams. Convolutional neural networks (CNNs) have recently achieved outstanding classification performance while require a very large number of annotated samples. However, a lack of available large-scale event-stream datasets prevents application of CNNs to classification of such event streams. In this work, we show how the deep representation learned with an originally optimized CNN is efficiently transferred to the event-stream classification tasks. In our classification method, a spike-event temporal coding is used to encoding the spike-event information of each pixel. This temporal coding mechanism is implemented based on the subthreshold dynamic of the leaky integrate-and-fire (LIF) model. Three popular event-stream datasets were used to evaluate the performance of the proposed method. Results show that the proposed method leads to significantly improved classification accuracy, outperforming the current state of the art methods on the three event-stream datasets. Besides, the robustness of our method was verified in the MNIST-DVS dataset when Gaussian temporal noises were added to the timestamps of the events. Finally, we find that fine tuning with a small amount of event-stream data would improve the classification performance. This work can be easily extended to more complex scenarios and more fascinating and potential visual applications.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[3]  E. Adrian,et al.  The impulses produced by sensory nerve endings , 1926, The Journal of physiology.

[4]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[5]  Bernabé Linares-Barranco,et al.  Poker-DVS and MNIST-DVS. Their History, How They Were Made, and Other Details , 2015, Front. Neurosci..

[6]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yael Pritch,et al.  High-Speed Object Tracking Using an Asynchronous Temporal Contrast Sensor , 2014, VMV.

[8]  E. Adrian,et al.  The impulses produced by sensory nerve‐endings , 1926 .

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Luping Shi,et al.  Classification of Spatiotemporal Events Based on Random Forest , 2016, BICS.

[11]  Bernabé Linares-Barranco,et al.  Feedforward Categorization on AER Motion Events Using Cortex-Like Features in a Spiking Neural Network , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  J Gautrais,et al.  Rate coding versus temporal order coding: a theoretical approach. , 1998, Bio Systems.

[13]  Tim Gollisch,et al.  Rapid Neural Coding in the Retina with Relative Spike Latencies , 2008, Science.

[14]  Shaista Hussain,et al.  Multiclass Classification by Adaptive Network of Dendritic Neurons with Binary Synapses Using Structural Plasticity , 2016, Front. Neurosci..

[15]  Bernabé Linares-Barranco,et al.  A 128$\,\times$ 128 1.5% Contrast Sensitivity 0.9% FPN 3 µs Latency 4 mW Asynchronous Frame-Free Dynamic Vision Sensor Using Transimpedance Preamplifiers , 2013, IEEE Journal of Solid-State Circuits.

[16]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[17]  Damien Querlioz,et al.  Extraction of temporally correlated features from dynamic vision sensors with spike-timing-dependent plasticity , 2012, Neural Networks.

[18]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ryad Benosman,et al.  Asynchronous Event-Based Visual Shape Tracking for Stable Haptic Feedback in Microrobotics , 2012, IEEE Transactions on Robotics.

[20]  Stephan Schraml,et al.  Spatiotemporal multiple persons tracking using Dynamic Vision Sensor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Bo Zhao,et al.  Bag of Events: An Efficient Probability-Based Feature Extraction Method for AER Image Sensors , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Tobi Delbruck,et al.  Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[23]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[24]  Ryad Benosman,et al.  Asynchronous event‐based high speed vision for microparticle tracking , 2012 .

[25]  R. Johansson,et al.  First spikes in ensembles of human tactile afferents code complex spatial fingertip events , 2004, Nature Neuroscience.

[26]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[27]  Tobias Delbrück,et al.  Frame-free dynamic digital vision , 2008 .

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Bernabé Linares-Barranco,et al.  Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate Coding and Coincidence Processing--Application to Feedforward ConvNets , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[31]  Ryad Benosman,et al.  Visual Tracking Using Neuromorphic Asynchronous Event-Based Cameras , 2015, Neural Computation.

[32]  Eugenio Culurciello,et al.  Efficient Feedforward Categorization of Objects and Human Postures with Address-Event Image Sensors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[35]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.