Uncertainty Guided Depth Fusion for Spike Camera

Depth estimation is essential for various important real-world applications such as autonomous driving. However, it suffers from severe performance degradation in high-velocity scenario since traditional cameras can only capture blurred images. To deal with this problem, the spike camera is designed to capture the pixel-wise luminance intensity at high frame rate. However, depth estimation with spike camera remains very challenging using traditional monocular or stereo depth estimation algorithms, which are based on the photometric consistency. In this paper, we propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse the predictions of monocular and stereo depth estimation networks for spike camera. Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range while monocular spike depth estimation obtains better results at long range. Therefore, we introduce a dual-task depth estimation architecture with a joint training strategy and estimate the distributed uncertainty to fuse the monocular and stereo results. In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K, which contains 20K paired samples, for spike depth estimation. UGDF achieves state-of-the-art results on CitySpike20K, surpassing all monocular or stereo spike depth estimation baselines. We conduct extensive experiments to evaluate the effectiveness and generalization of our method on CitySpike20K. To the best of our knowledge, our framework is the first dual-task fusion framework for spike camera depth estimation. Code and dataset will be released.

[1]  Biyang Liu,et al.  Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks , 2021, AAAI.

[2]  Boxin Shi,et al.  Optical Flow Estimation for Spiking Camera , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  T. Masquelier,et al.  StereoSpike: Depth Learning With a Spiking Neural Network , 2021, IEEE Access.

[4]  Hang Zhou,et al.  SUB-Depth: Self-distillation and Uncertainty Boosting Self-supervised Monocular Depth Estimation , 2021, ArXiv.

[5]  Yonghong Tian,et al.  NeuSpike-Net: High Speed Video Reconstruction via Bio-inspired Neuromorphic Cameras , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Tiejun Huang,et al.  Super Resolve Dynamic Scene from Continuous Spike Streams , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Liusheng Huang,et al.  Revealing the Reciprocal Relations between Self-Supervised Stereo and Monocular Depth Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Tiejun Huang,et al.  High-speed Image Reconstruction through Short-term Plasticity for Spiking Cameras , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ruiqin Xiong,et al.  Spk2ImgNet: Learning to Reconstruct Dynamic Scene from Continuous Spike Stream , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yu Qiao,et al.  HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  J. Zhang,et al.  HINet: Half Instance Normalization Network for Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Yuchao Dai,et al.  CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Munchurl Kim,et al.  XVFI: eXtreme Video Frame Interpolation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Vladlen Koltun,et al.  Vision Transformers for Dense Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Nicu Sebe,et al.  Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  L. Gool,et al.  Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Liang Liu,et al.  HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation , 2020, AAAI.

[18]  Peter Wonka,et al.  AdaBins: Depth Estimation Using Adaptive Bins , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jing Zhao,et al.  High-Speed Motion Scene Reconstruction for Spike Camera via Motion Aligned Filtering , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[20]  Haibin Ling,et al.  3D Mapping and 6D Pose Computation for Real Time Augmented Reality on Cylindrical Objects , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Bhakti Baheti,et al.  Eff-UNet: A Novel Architecture for Semantic Segmentation in Unstructured Environment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Yonghong Tian,et al.  Retina-Like Visual Image Reconstruction via Spiking Neural Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  V. Lepetit,et al.  Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yichen Zhang,et al.  Towards the Next Generation of Retinal Neuroprosthesis: Visual Computation with Spikes. , 2020, 2001.04064.

[25]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Tiejun Huang,et al.  A Retina-Inspired Sampling Method for Visual Texture Reconstruction , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Xia Li,et al.  6D-VNet: End-To-End 6DoF Vehicle Pose Estimation From Monocular RGB Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Chang-Su Kim,et al.  Monocular Depth Estimation Using Relative Depth Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Vincent Lepetit,et al.  SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[30]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Stefano Mattoccia,et al.  Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Henry Fuchs,et al.  StereoDRNet: Dilated Residual StereoNet , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiaogang Wang,et al.  Group-Wise Correlation Stereo Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Tiejun Huang,et al.  An Efficient Coding Method for Spike Camera Using Inter-Spike Intervals , 2019, 2019 Data Compression Conference (DCC).

[36]  Kostas Daniilidis,et al.  Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[39]  Rynson W. H. Lau,et al.  Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss , 2018, ECCV.

[40]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[41]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Nicu Sebe,et al.  Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Kostas Daniilidis,et al.  Realtime Time Synchronized Event-based Stereo , 2018, ECCV.

[44]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Vijay Kumar,et al.  The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception , 2018, IEEE Robotics and Automation Letters.

[46]  Hong Zhang,et al.  Unsupervised Learning of Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[48]  Larry H. Matthies,et al.  Depth from stereo polarization in specular scenes for urban robotics , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[52]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[55]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[56]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[57]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[58]  Andreas Steininger,et al.  Hardware implementation of an SAD based stereo vision algorithm , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.