Unsupervised Deep Event Stereo for Depth Estimation

Bio-inspired event cameras have been considered effective alternatives to traditional frame-based cameras for stereo depth estimation, especially in challenging conditions such as low-light or high-speed environments. Recently, deep learning-based supervised event stereo matching methods have achieved significant performance improvements over the traditional event stereo methods. However, the supervised methods depend on ground-truth disparity maps for training, and it is difficult to secure a large amount of ground-truth disparity maps. A feasible alternative is to devise an unsupervised event stereo method that can be trained without ground-truth disparity maps. To this end, we propose the first unsupervised event stereo matching method that can predict dense disparity maps, and is trained by transforming the depth estimation problem into a warping-based reconstruction problem. We propose a novel unsupervised loss function that enforces the network to minimize the feature-level epipolar correlation difference between the ground-truth intensity images and warped images. Moreover, we propose a novel event embedding mechanism that utilizes both temporal and spatial neighboring events to capture spatio-temporal relationships among the events for stereo matching. Experimental results reveal that the proposed method outperforms the baseline unsupervised methods by significant margins (e.g., up to 16.88% improvement) and achieves comparable results with the existing supervised methods. Extensive ablation studies validate the efficacy of the proposed modules and architectural choices.

[1]  Youfu Li,et al.  MVF-Net: A Multi-View Fusion Network for Event-Based Object Classification , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Jenq-Neng Hwang,et al.  Depth Estimation Using a Self-Supervised Network Based on Cross-Layer Feature Fusion and the Quadtree Constraint , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Yulan Guo,et al.  Parallax Attention for Unsupervised Stereo Correspondence Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chiara Bartolozzi,et al.  Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Sayed Mohammad Mostafavi Isfahani,et al.  Event-Intensity Stereo: Estimating Depth by the Best of Both Worlds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Hae Woong Jang,et al.  Deep Event Stereo Leveraged by Event-to-Image Translation , 2021, AAAI.

[7]  Davide Scaramuzza,et al.  DSEC: A Stereo Event Camera Dataset for Driving Scenarios , 2021, IEEE Robotics and Automation Letters.

[8]  Davide Scaramuzza,et al.  Combining Events and Frames Using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction , 2021, IEEE Robotics and Automation Letters.

[9]  Garrick Orchard,et al.  e-TLD: Event-Based Framework for Dynamic Object Tracking , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Shaojie Shen,et al.  Event-Based Stereo Visual Odometry , 2020, IEEE Transactions on Robotics.

[11]  J. Yorke,et al.  Unsupervised Learning of Dense Optical Flow, Depth and Egomotion with Event-Based Sensors , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Kuk-Jin Yoon,et al.  Loop-Net: Joint Unsupervised Disparity and Optical Flow Estimation of Stereo Videos With Spatiotemporal Loop Consistency , 2020, IEEE Robotics and Automation Letters.

[13]  Seungryong Kim,et al.  Unsupervised Stereo Matching Using Confidential Correspondence Consistency , 2020, IEEE Transactions on Intelligent Transportation Systems.

[14]  Michael R. Lyu,et al.  Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  M. Matteucci,et al.  Matrix-LSTM: a Differentiable Recurrent Surface for Asynchronous Event-Based Data , 2020, ECCV.

[16]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[17]  Peter V. Gehler,et al.  Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Yi Yang,et al.  Supplementary Materials for UnOS: Unified Unsupervised Optical-flow and Stereo-depth Estimation by Watching Videos , 2019 .

[19]  Davide Scaramuzza,et al.  End-to-End Learning of Representations for Asynchronous Event-Based Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Xiaogang Wang,et al.  Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize , 2019, AAAI.

[21]  Kostas Daniilidis,et al.  Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Matthew Johnson-Roberson,et al.  DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation From Stereo Imagery , 2018, IEEE Robotics and Automation Letters.

[23]  Ang Li,et al.  Occlusion Aware Stereo Matching via Cooperative Unsupervised Learning , 2018, ACCV.

[24]  Ningqi Luo,et al.  Unsupervised Stereo Matching with Occlusion-Aware Loss , 2018, PRICAI.

[25]  Zhidong Deng,et al.  SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[26]  Yi Zhou,et al.  Semi-Dense 3D Reconstruction with a Stereo Event Camera , 2018, ECCV.

[27]  Ryad Benosman,et al.  Neuromorphic Event-Based Generalized Time-Based Stereovision , 2018, Front. Neurosci..

[28]  François Fleuret,et al.  Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching , 2018, NeurIPS.

[29]  Shoushun Chen,et al.  Event-Guided Structured Output Tracking of Fast-Moving Objects Using a CeleX Sensor , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Kostas Daniilidis,et al.  Realtime Time Synchronized Event-based Stereo , 2018, ECCV.

[31]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Vijay Kumar,et al.  The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception , 2018, IEEE Robotics and Automation Letters.

[33]  Pengfei Wang,et al.  Event-based stereo matching using semiglobal matching , 2018 .

[34]  Shengyong Chen,et al.  Event-Based Stereo Depth Estimation Using Belief Propagation , 2017, Front. Neurosci..

[35]  Hongdong Li,et al.  Self-Supervised Learning for Stereo Matching with Self-Improving Ability , 2017, ArXiv.

[36]  Jörg Conradt,et al.  Spiking Cooperative Stereo-Matching at 2 ms Latency with Neuromorphic Hardware , 2017, Living Machines.

[37]  Ahmed Nabil Belbachir,et al.  Improved Cooperative Stereo Matching for Dynamic Vision Sensors with Ground Truth Evaluation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Bingbing Ni,et al.  Unsupervised Deep Learning for Optical Flow Estimation , 2017, AAAI.

[40]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Qiang Wang,et al.  ROBUST DENSE DEPTH MAP ESTIMATION FROM SPARSE DVS STEREOS 3 2 Related Work , 2017 .

[42]  Feng Shi,et al.  Context-aware event-driven stereo matching , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[43]  Konstantinos G. Derpanis,et al.  Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[44]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jörg Conradt,et al.  Asynchronous Event-based Cooperative Stereo Matching Using Neuromorphic Silicon Retinas , 2016, Neural Processing Letters.

[46]  Horst Bischof,et al.  Event-driven stereo matching for real-time 3D panoramic vision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  Margrit Gelautz,et al.  Enhancement of sparse silicon retina-based stereo matching using belief propagation and two-stage postfiltering , 2014, J. Electronic Imaging.

[50]  Bernabe Linares-Barranco,et al.  On the use of orientation filters for 3D reconstruction in event-driven stereo vision , 2014, Front. Neurosci..

[51]  Ahmed Nabil Belbachir,et al.  Asynchronous Stereo Vision for Event-Driven Dynamic Stereo Sensor Using an Adaptive Cooperative Approach , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[52]  Ryad Benosman,et al.  Event-based 3D reconstruction from neuromorphic retinas , 2013, Neural Networks.

[53]  Tobi Delbrück,et al.  Live demonstration: Gesture-based remote control using stereo pair of dynamic vision sensors , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[54]  Tobi Delbrück,et al.  Asynchronous Event-Based Binocular Stereo Matching , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Ryad Benosman,et al.  Asynchronous Event-Based Hebbian Epipolar Geometry , 2011, IEEE Transactions on Neural Networks.

[56]  Christoph Sulzbachner,et al.  Event-Based Stereo Matching Approaches for Frameless Address Event Stereo Data , 2011, ISVC.

[57]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[58]  Christoph Sulzbachner,et al.  Address-Event Based Stereo Vision with Bio-Inspired Silicon Retina Imagers , 2011 .

[59]  Stephan Schraml,et al.  Dynamic stereo vision system for real-time tracking , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[60]  Stephan Schraml,et al.  A spatio-temporal clustering method using real-time motion analysis on event-based 3D vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[61]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  J. Li,et al.  An Epipolar Geometry-Based Fast Disparity Estimation Algorithm for Multiview Image and Video Coding , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[63]  Stephan Schraml,et al.  Smartcam for real-time stereo vision - address-event based embedded system , 2007, VISAPP.

[64]  D Marr,et al.  Cooperative computation of stereo disparity. , 1976, Science.