HATS: Histograms of Averaged Time Surfaces for Robust Event-Based Object Classification

Event-based cameras have recently drawn the attention of the Computer Vision community thanks to their advantages in terms of high temporal resolution, low power consumption and high dynamic range, compared to traditional frame-based cameras. These properties make event-based cameras an ideal choice for autonomous vehicles, robot navigation or UAV vision, among others. However, the accuracy of event-based object classification algorithms, which is of crucial importance for any reliable system working in real-world conditions, is still far behind their frame-based counterparts. Two main reasons for this performance gap are: 1. The lack of effective low-level representations and architectures for event-based object classification and 2. The absence of large real-world event-based datasets. In this paper we address both problems. First, we introduce a novel event-based feature representation together with a new machine learning architecture. Compared to previous approaches, we use local memory units to efficiently leverage past temporal information and build a robust event-based representation. Second, we release the first large real-world event-based dataset for object classification. We compare our method to the state-of-the-art with extensive experiments, showing better classification performance and real-time computation.

[1]  Bernabé Linares-Barranco,et al.  On Spike-Timing-Dependent-Plasticity, Memristive Devices, and Building a Self-Learning Visual Cortex , 2011, Front. Neurosci..

[2]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[3]  Tobi Delbrück,et al.  The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM , 2016, Int. J. Robotics Res..

[4]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[5]  Giacomo Indiveri,et al.  Spatio-temporal Spike Pattern Classification in Neuromorphic Systems , 2013, Living Machines.

[6]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Huajin Tang,et al.  Bag of Events: An Efficient Probability-Based Feature Extraction Method for AER Image Sensors , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Bernabé Linares-Barranco,et al.  A 128$\,\times$ 128 1.5% Contrast Sensitivity 0.9% FPN 3 µs Latency 4 mW Asynchronous Frame-Free Dynamic Vision Sensor Using Transimpedance Preamplifiers , 2013, IEEE Journal of Solid-State Circuits.

[9]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yi Dong,et al.  Optimization Methods for Spiking Neurons and Networks , 2010, IEEE Transactions on Neural Networks.

[11]  Sander M. Bohte,et al.  Error-backpropagation in temporally encoded networks of spiking neurons , 2000, Neurocomputing.

[12]  Tobias Schreck,et al.  Histograms of Oriented Gradients for 3D Object Retrieval , 2010 .

[13]  Gregory Cohen,et al.  Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[14]  Mingoo Seok,et al.  Energy-Efficient Neuromorphic Classifiers , 2016, Neural Computation.

[15]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[16]  Ryad Benosman,et al.  Accelerated frame-free time-encoded multi-step imaging , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[17]  Tobi Delbrück,et al.  ELiSeD — An event-based line segment detector , 2016, 2016 Second International Conference on Event-based Control, Communication, and Signal Processing (EBCCSP).

[18]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Nitish V. Thakor,et al.  HFirst: A Temporal Approach to Object Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[21]  ZissermanAndrew,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009 .

[22]  Kwabena Boahen,et al.  Point-to-point connectivity between neuromorphic chips using address events , 2000 .

[23]  Garrick Orchard,et al.  Benchmarking neuromorphic vision: lessons learnt from computer vision , 2015, Front. Neurosci..

[24]  Deepak Khosla,et al.  Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition , 2014, International Journal of Computer Vision.

[25]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[26]  Bernabé Linares-Barranco,et al.  Feedforward Categorization on AER Motion Events Using Cortex-Like Features in a Spiking Neural Network , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Damien Querlioz,et al.  Extraction of temporally correlated features from dynamic vision sensors with spike-timing-dependent plasticity , 2012, Neural Networks.

[28]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[29]  Andrew P. Witkin,et al.  Scale-space filtering: A new approach to multi-scale description , 1984, ICASSP.

[30]  H. Sompolinsky,et al.  The tempotron: a neuron that learns spike timing–based decisions , 2006, Nature Neuroscience.

[31]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Garrick Orchard,et al.  HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Tobi Delbrück,et al.  Training Deep Spiking Neural Networks Using Backpropagation , 2016, Front. Neurosci..

[37]  Sio-Hoi Ieng,et al.  Spatiotemporal features for asynchronous event-based data , 2015, Front. Neurosci..

[38]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Hongjie Liu,et al.  DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition , 2016, Front. Neurosci..

[40]  Bernabé Linares-Barranco,et al.  An Event-Driven Classifier for Spiking Neural Networks Fed with Synthetic or Dynamic Vision Sensor Data , 2017, Front. Neurosci..

[41]  Tobi Delbrück,et al.  Combined frame- and event-based detection and tracking , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[42]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Nikola Kasabov,et al.  Dynamic evolving spiking neural networks for on-line spatio- and spectro-temporal pattern recognition. , 2013, Neural networks : the official journal of the International Neural Network Society.

[44]  R. Vidal,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Bernabé Linares-Barranco,et al.  A Neuromorphic Cortical-Layer Microchip for Spike-Based Event Processing Vision Systems , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[46]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Luping Shi,et al.  Classification of Spatiotemporal Events Based on Random Forest , 2016, BICS.

[49]  Sébastien Barré,et al.  A Motion-Based Feature for Event-Based Pattern Recognition , 2017, Front. Neurosci..

[50]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[51]  Shih-Chii Liu,et al.  Learning to be efficient: algorithms for training low-latency, low-compute deep spiking neural networks , 2016, SAC.

[52]  Bo Zhao,et al.  Event-Based Hough Transform in a Spiking Neural Network for Multiple Line Detection and Tracking Using a Dynamic Vision Sensor , 2016, BMVC.

[53]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[54]  Tobi Delbruck,et al.  Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[55]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[56]  Davide Scaramuzza,et al.  EVO: A Geometric Approach to Event-Based 6-DOF Parallel Tracking and Mapping in Real Time , 2017, IEEE Robotics and Automation Letters.

[57]  Tobi Delbrück,et al.  Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking Output , 2014, Proceedings of the IEEE.

[58]  Chiara Bartolozzi,et al.  Fast event-based Harris corner detection exploiting the advantages of event-driven cameras , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[59]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[60]  Daniel Matolin,et al.  A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS , 2011, IEEE Journal of Solid-State Circuits.

[61]  Chiara Bartolozzi,et al.  Fast Event-based Corner Detection , 2017, BMVC.

[62]  Ryad Benosman,et al.  Asynchronous event-based corner detection and matching , 2015, Neural Networks.

[63]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[64]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[65]  Matthew Cook,et al.  Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[66]  Bernabé Linares-Barranco,et al.  Poker-DVS and MNIST-DVS. Their History, How They Were Made, and Other Details , 2015, Front. Neurosci..

[67]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Luping Shi,et al.  CIFAR10-DVS: An Event-Stream Dataset for Object Classification , 2017, Front. Neurosci..

[69]  Iulia-Alexandra Lungu,et al.  Theory and Tools for the Conversion of Analog to Spiking Convolutional Neural Networks , 2016, ArXiv.

[70]  Tobi Delbrück,et al.  An Electronic Photoreceptor Sensitive to Small Changes in Intensity , 1988, NIPS.

[71]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[72]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[73]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[74]  Eugenio Culurciello,et al.  Activity-driven, event-based vision sensors , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[75]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.