Back to Event Basics: Self-Supervised Learning of Image Reconstruction for Event Cameras via Photometric Constancy

Event cameras are novel vision sensors that sample, in an asynchronous fashion, brightness increments with low latency and high temporal resolution. The resulting streams of events are of high value by themselves, especially for high speed motion estimation. However, a growing body of work has also focused on the reconstruction of intensity frames from the events, as this allows bridging the gap with the existing literature on appearance- and frame-based computer vision. Recent work has mostly approached this problem using neural networks trained with synthetic, ground-truth data. In this work we approach, for the first time, the intensity reconstruction problem from a self-supervised learning perspective. Our method, which leverages the knowledge of the inner workings of event cameras, combines estimated optical flow and the event-based photometric constancy to train neural networks without the need for any ground-truth or synthetic data. Results across multiple datasets show that the performance of the proposed self-supervised approach is in line with the state-of-the-art. Additionally, we propose a novel, lightweight neural network for optical flow estimation that achieves high speed inference with only a minor drop in performance.

[1]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Kostas Daniilidis,et al.  EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras , 2018, Robotics: Science and Systems.

[3]  Kostas Daniilidis,et al.  Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ryad Benosman,et al.  Simultaneous Mosaicing and Tracking with an Event Camera , 2014, BMVC.

[5]  Nick Barnes,et al.  Fast Image Reconstruction with an Event Camera , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[7]  Matthew Cook,et al.  Interacting maps for fast visual interpretation , 2011, The 2011 International Joint Conference on Neural Networks.

[8]  Thomas Pock,et al.  Real-Time Intensity-Image Reconstruction for Event Cameras Using Manifold Regularisation , 2016, International Journal of Computer Vision.

[9]  Konstantinos G. Derpanis,et al.  Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[10]  Davide Scaramuzza,et al.  ESIM: an Open Event Camera Simulator , 2018, CoRL.

[11]  R. Mahony,et al.  Reducing the Sim-to-Real Gap for Event Cameras , 2020, ECCV.

[12]  Yiannis Aloimonos,et al.  Event-Based Moving Object Detection and Tracking , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Vladlen Koltun,et al.  High Speed and High Dynamic Range Video with an Event Camera , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Etienne Perot,et al.  Learning to Detect Objects with a 1 Megapixel Event Camera , 2020, NeurIPS.

[15]  Christopher Joseph Pal,et al.  Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[16]  Saurabh Maheshwari,et al.  Contrast limited adaptive histogram equalization based enhancement for real time video system , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[17]  Titus Cieslewski,et al.  Are We Ready for Autonomous Drone Racing? The UZH-FPV Drone Racing Dataset , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[18]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[19]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Guido Borghi,et al.  Learn to See by Events: Color Frame Synthesis from Event and RGB Cameras , 2020, VISIGRAPP.

[21]  Vladlen Koltun,et al.  Events-To-Video: Bringing Modern Computer Vision to Event Cameras , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[23]  Vijay Kumar,et al.  The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception , 2018, IEEE Robotics and Automation Letters.

[24]  Nick Barnes,et al.  Continuous-time Intensity Estimation Using Event Cameras , 2018, ACCV.

[25]  Davide Scaramuzza,et al.  A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Yo-Sung Ho,et al.  Event-Based High Dynamic Range Image and Very High Frame Rate Video Generation Using Conditional Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Stefan Leutenegger,et al.  Simultaneous Optical Flow and Intensity Estimation from an Event Camera , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[29]  Tobi Delbrück,et al.  The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM , 2016, Int. J. Robotics Res..

[30]  T. Delbruck,et al.  A 128 128 120 dB 15 s Latency Asynchronous Temporal Contrast Vision Sensor , 2006 .

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Davide Scaramuzza,et al.  EKLT: Asynchronous Photometric Feature Tracking Using Events and Frames , 2018, International Journal of Computer Vision.

[33]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[34]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[35]  Michel Barlaud,et al.  Two deterministic half-quadratic regularization algorithms for computed imaging , 1994, Proceedings of 1st International Conference on Image Processing.

[36]  Kirk Y. W. Scheper,et al.  Unsupervised Learning of a Hierarchical Spiking Neural Network for Optical Flow Estimation: From Events to Global Motion Perception , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Davide Scaramuzza,et al.  Event-based Camera Pose Tracking using a Generative Event Model , 2015, ArXiv.

[38]  F. Paredes-Vall'es,et al.  How Do Neural Networks Estimate Optical Flow? A Neuropsychology-Inspired Study , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Lin Wang,et al.  EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[42]  Jonghyun Choi,et al.  Learning to Super Resolve Intensity Images From Events , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).