End-to-End Learning of Representations for Asynchronous Event-Based Data

Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events”. They have appealing advantages over frame based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatio-temporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations by means of strictly differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.

[1]  Davide Scaramuzza,et al.  Event-based, 6-DOF pose tracking for high-speed maneuvers , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3]  Michael J. Black,et al.  A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.

[4]  Andrea Censi Efficient neuromorphic optomotor heading regulation , 2015, 2015 American Control Conference (ACC).

[5]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tobi Delbrück,et al.  Training Deep Spiking Neural Networks Using Backpropagation , 2016, Front. Neurosci..

[7]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[8]  Matthew Cook,et al.  Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[9]  Ashok Veeraraghavan,et al.  Direct face detection and video reconstruction from event cameras , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Chiara Bartolozzi,et al.  Asynchronous frameless event-based optical flow , 2012, Neural Networks.

[11]  Louis B. Rall,et al.  Automatic differentiation , 1981 .

[12]  Davide Scaramuzza,et al.  Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[13]  Vijay Kumar,et al.  The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception , 2018, IEEE Robotics and Automation Letters.

[14]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Davide Scaramuzza,et al.  Asynchronous, Photometric Feature Tracking using Events and Frames , 2018, ECCV.

[17]  Alexander Andreopoulos,et al.  A Low Power, High Throughput, Fully Event-Based Stereo System , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Kostas Daniilidis,et al.  Event-Based Visual Inertial Odometry , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kostas Daniilidis,et al.  EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras , 2018, Robotics: Science and Systems.

[20]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[21]  Davide Scaramuzza,et al.  EVO: A Geometric Approach to Event-Based 6-DOF Parallel Tracking and Mapping in Real Time , 2017, IEEE Robotics and Automation Letters.

[22]  Davide Scaramuzza,et al.  Event-Based, 6-DOF Camera Tracking from Photometric Depth Maps , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Stefan Leutenegger,et al.  Simultaneous Optical Flow and Intensity Estimation from an Event Camera , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Davide Scaramuzza,et al.  EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time , 2017, International Journal of Computer Vision.

[25]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[26]  Nitish V. Thakor,et al.  HFirst: A Temporal Approach to Object Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Kostas Daniilidis,et al.  Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Daniel Matolin,et al.  A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS , 2011, IEEE Journal of Solid-State Circuits.

[32]  Kostas Daniilidis,et al.  Realtime Time Synchronized Event-based Stereo , 2018, ECCV.

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Kostas Daniilidis,et al.  Unsupervised Event-Based Optical Flow Using Motion Compensation , 2018, ECCV Workshops.

[35]  Ryad Benosman,et al.  HATS: Histograms of Averaged Time Surfaces for Robust Event-Based Object Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[37]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Garrick Orchard,et al.  HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[40]  Kyoobin Lee,et al.  Computationally efficient, real-time motion recognition based on bio-inspired visual and cognitive processing , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[41]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[46]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Davide Scaramuzza,et al.  A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[49]  Terrence J. Sejnowski,et al.  Gradient Descent for Spiking Neural Networks , 2017, NeurIPS.

[50]  F. Ponulak ReSuMe-New Supervised Learning Method for Spiking Neural Networks , 2005 .

[51]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[52]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[53]  Gregory Cohen,et al.  Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[54]  Christian Heipke,et al.  Joint 3d Estimation of Vehicles and Scene Flow , 2015 .

[55]  C. P. A C O R E T,et al.  Asynchronous event-based high speed vision for microparticle tracking , 2011 .

[56]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[57]  Davide Scaramuzza,et al.  Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization , 2017, BMVC.

[58]  Chiara Bartolozzi,et al.  Event-Based Visual Flow , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[59]  Ryad Benosman,et al.  Asynchronous Event-Based Hebbian Epipolar Geometry , 2011, IEEE Transactions on Neural Networks.

[60]  Kostas Daniilidis,et al.  Event-based feature tracking with probabilistic data association , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Tobi Delbrück,et al.  Live demonstration: Gesture-based remote control using stereo pair of dynamic vision sensors , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[63]  Stefan Schliebs,et al.  SPAN: A Neuron for Precise-Time Spike Pattern Association , 2011, ICONIP.

[64]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[65]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[66]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[67]  Bernabé Linares-Barranco,et al.  Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate Coding and Coincidence Processing--Application to Feedforward ConvNets , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Narciso García,et al.  Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[71]  Bernabé Linares-Barranco,et al.  Feedforward Categorization on AER Motion Events Using Cortex-Like Features in a Spiking Neural Network , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[72]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.