Video to Events: Recycling Video Datasets for Event Cameras

Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high dynamic range (HDR), high temporal resolution, and no motion blur. Recently, novel learning approaches operating on event data have achieved impressive results. Yet, these methods require a large amount of event data for training, which is hardly available due the novelty of event sensors in computer vision research. In this paper, we present a method that addresses these needs by converting any existing video dataset recorded with conventional cameras to synthetic event data. This unlocks the use of a virtually unlimited number of existing video datasets for training networks designed for real event data. We evaluate our method on two relevant vision tasks, i.e., object recognition and semantic segmentation, and show that models trained on synthetic events have several benefits: (i) they generalize well to real event data, even in scenarios where standard-camera images are blurry or overexposed, by inheriting the outstanding properties of event cameras; (ii) they can be used for fine-tuning on real data to improve over state-of-the-art for both classification and semantic segmentation.

[1]  Kostas Daniilidis,et al.  EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras , 2018, Robotics: Science and Systems.

[2]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[3]  Vladlen Koltun,et al.  High Speed and High Dynamic Range Video with an Event Camera , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Tobi Delbrück,et al.  The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM , 2016, Int. J. Robotics Res..

[5]  Tobi Delbrück,et al.  DHP19: Dynamic Vision Sensor 3D Human Pose Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[7]  Yiannis Andreopoulos,et al.  PIX2NVS: Parameterized conversion of pixel-domain video frames to neuromorphic vision streams , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[8]  Narciso García,et al.  Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[10]  Gregory Cohen,et al.  Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[11]  Davide Scaramuzza,et al.  Event-based Camera Pose Tracking using a Generative Event Model , 2015, ArXiv.

[12]  Peter V. Gehler,et al.  Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Gabriela Csurka,et al.  What is a good evaluation measure for semantic segmentation? , 2013, BMVC.

[14]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Davide Scaramuzza,et al.  ESIM: an Open Event Camera Simulator , 2018, CoRL.

[16]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[17]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Yiannis Andreopoulos,et al.  Graph-Based Object Classification for Neuromorphic Vision Sensing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Tobi Delbrück,et al.  DDD17: End-To-End DAVIS Driving Dataset , 2017, ArXiv.

[23]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ryad Benosman,et al.  Simultaneous Mosaicing and Tracking with an Event Camera , 2014, BMVC.

[25]  Davide Scaramuzza,et al.  Event-Based, 6-DOF Camera Tracking from Photometric Depth Maps , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Davide Scaramuzza,et al.  Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High-Speed Scenarios , 2017, IEEE Robotics and Automation Letters.

[29]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[30]  Vladlen Koltun,et al.  Events-To-Video: Bringing Modern Computer Vision to Event Cameras , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Davide Scaramuzza,et al.  End-to-End Learning of Representations for Asynchronous Event-Based Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Mohammed Bennamoun,et al.  Resfeats: Residual network based features for image classification , 2016, 2017 IEEE International Conference on Image Processing (ICIP).

[34]  Ana Cristina Murillo,et al.  EV-SegNet: Semantic Segmentation for Event-Based Cameras , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Rüdiger Dillmann,et al.  Towards a framework for end-to-end control of a simulated vehicle with spiking neural networks , 2016, 2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[36]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Ryad Benosman,et al.  HATS: Histograms of Averaged Time Surfaces for Robust Event-Based Object Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Vijay Kumar,et al.  The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception , 2018, IEEE Robotics and Automation Letters.

[41]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Kostas Daniilidis,et al.  Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .