Stereo Depth from Events Cameras: Concentrate and Focus on the Future

Neuromorphic cameras or event cameras mimic human vision by reporting changes in the intensity in a scene, instead of reporting the whole scene at once in a form of an image frame as performed by conventional cameras. Events are streamed data that are often dense when either the scene changes or the camera moves rapidly. The rapid movement causes the events to be overridden or missed when creating a tensor for the machine to learn on. To alleviate the event missing or overriding issue, we propose to learn to concentrate on the dense events to produce a compact event representation with high details for depth estimation. Specifically, we learn a model with events from both past and future but infer only with past data with the predicted future. We initially estimate depth in an event-only setting but also propose to further incorporate images and events by a hier-archical event and intensity combination network for better depth estimation. By experiments in challenging real-world scenarios, we validate that our method outperforms prior arts even with low computational cost. Code is available at: https://github.com/yonseivnl/se-cff.

[1]  Sayed Mohammad Mostafavi Isfahani,et al.  E2SRI: Learning to Super-Resolve Intensity Images From Events , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Mohammed Bennamoun,et al.  A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Sayed Mohammad Mostafavi Isfahani,et al.  Event-Intensity Stereo: Estimating Depth by the Best of Both Worlds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Tobi Delbrück,et al.  v2e: From Video Frames to Realistic DVS Events , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Youfu Li,et al.  Learning From Images: A Distillation Learning Framework for Event Cameras , 2021, IEEE Transactions on Image Processing.

[6]  Yuchao Dai,et al.  CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Davide Scaramuzza,et al.  DSEC: A Stereo Event Camera Dataset for Driving Scenarios , 2021, IEEE Robotics and Automation Letters.

[8]  Tat-Jun Chin,et al.  Spatiotemporal Registration for Event-based Visual Odometry , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Chunhua Shen,et al.  Channel-wise Knowledge Distillation for Dense Prediction* , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Juyong Zhang,et al.  AANet: Adaptive Aggregation Network for Efficient Stereo Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Tat-Jun Chin,et al.  Globally Optimal Contrast Maximisation for Event-Based Motion Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sayed Mohammad Mostafavi Isfahani,et al.  Learning to Super Resolve Intensity Images From Events , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[14]  Peter V. Gehler,et al.  Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Lindsay Kleeman,et al.  Event Cameras, Contrast Maximization and Reward Functions: An Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rüdiger Dillmann,et al.  Neuromorphic Stereo Vision: A Survey of Bio-Inspired Sensors and Algorithms , 2019, Front. Neurorobot..

[17]  Davide Scaramuzza,et al.  Focus Is All You Need: Loss Functions for Event-Based Vision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Henry Fuchs,et al.  StereoDRNet: Dilated Residual StereoNet , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Tom Drummond,et al.  Event-Based Motion Segmentation by Motion Compensation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Yo-Sung Ho,et al.  Event-Based High Dynamic Range Image and Very High Frame Rate Video Generation Using Conditional Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yi Zhou,et al.  Semi-Dense 3D Reconstruction with a Stereo Event Camera , 2018, ECCV.

[23]  Alexander Andreopoulos,et al.  A Low Power, High Throughput, Fully Event-Based Stereo System , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Davide Scaramuzza,et al.  A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth, and Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Kostas Daniilidis,et al.  Realtime Time Synchronized Event-based Stereo , 2018, ECCV.

[26]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Kostas Daniilidis,et al.  EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras , 2018, Robotics: Science and Systems.

[28]  Jörg Conradt,et al.  Spiking Cooperative Stereo-Matching at 2 ms Latency with Neuromorphic Hardware , 2017, Living Machines.

[29]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Ryad Benosman,et al.  A spiking neural network model of 3D perception for event-based neuromorphic stereo vision systems , 2017, Scientific Reports.

[32]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jörg Conradt,et al.  Asynchronous Event-based Cooperative Stereo Matching Using Neuromorphic Silicon Retinas , 2016, Neural Processing Letters.

[36]  Davide Scaramuzza,et al.  Lifetime estimation of events from Dynamic Vision Sensors , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Bernabe Linares-Barranco,et al.  On the use of orientation filters for 3D reconstruction in event-driven stereo vision , 2014, Front. Neurosci..

[41]  Ahmed Nabil Belbachir,et al.  Asynchronous Stereo Vision for Event-Driven Dynamic Stereo Sensor Using an Adaptive Cooperative Approach , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[42]  Tobi Delbrück,et al.  Asynchronous Event-Based Binocular Stereo Matching , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Christoph Sulzbachner,et al.  Event-Based Stereo Matching Approaches for Frameless Address Event Stereo Data , 2011, ISVC.

[44]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[45]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[47]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[48]  Janine M. Benyus,et al.  Biomimicry: Innovation Inspired by Nature , 1997 .

[49]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.