Guided Stereo Matching

Stereo is a prominent technique to infer dense depth maps from images, and deep learning further pushed forward the state-of-the-art, making end-to-end architectures unrivaled when enough data is available for training. However, deep networks suffer from significant drops in accuracy when dealing with new environments. Therefore, in this paper, we introduce Guided Stereo Matching, a novel paradigm leveraging a small amount of sparse, yet reliable depth measurements retrieved from an external source enabling to ameliorate this weakness. The additional sparse cues required by our method can be obtained with any strategy (e.g., a LiDAR) and used to enhance features linked to corresponding disparity hypotheses. Our formulation is general and fully differentiable, thus enabling to exploit the additional sparse inputs in pre-trained deep stereo networks as well as for training a new instance from scratch. Extensive experiments on three standard datasets and two state-of-the-art deep architectures show that even with a small set of sparse input cues, i) the proposed paradigm enables significant improvements to pre-trained networks. Moreover, ii) training from scratch notably increases accuracy and robustness to domain shifts. Finally, iii) it is suited and effective even with traditional stereo algorithms such as SGM.

[1]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[2]  Stefano Mattoccia,et al.  Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures , 2016, ECCV.

[3]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Marc Pollefeys,et al.  Patch Based Confidence Prediction for Dense Disparity Map , 2016, BMVC.

[5]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Marc Pollefeys,et al.  SGM-Nets: Semi-Global Matching with Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[8]  Luigi di Stefano,et al.  Learning confidence measures in the wild , 2017, BMVC.

[9]  Nikos Komodakis,et al.  Detect, Replace, Refine: Deep Structured Prediction for Pixel Wise Labeling , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[11]  Stefano Mattoccia,et al.  Quantitative Evaluation of Confidence Measures in a Machine Learning World , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Xiaogang Wang,et al.  Learning Monocular Depth by Distilling Cross-domain Stereo Networks , 2018, ECCV.

[13]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Philippos Mordohai,et al.  RecResNet: A Recurrent Residual CNN Architecture for Disparity Map Enhancement , 2018, 2018 International Conference on 3D Vision (3DV).

[15]  Wei Chen,et al.  Learning for Disparity Estimation Through Feature Constancy , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Yucheng Wang,et al.  Deep Stereo Matching with Explicit Cost Aggregation Sub-Architecture , 2018, AAAI.

[17]  Takeshi Naemura,et al.  Continuous 3D Label Stereo Matching Using Local Expansion Moves , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Raúl Rojas,et al.  Large scale Semi-Global Matching on the CPU , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[19]  Xu Zhao,et al.  EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching , 2018, ACCV.

[20]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[22]  Stefano Mattoccia,et al.  Learning a General-Purpose Confidence Measure Based on O(1) Features and a Smarter Aggregation Strategy for Semi Global Matching , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[23]  Guido M. Cortelazzo,et al.  Probabilistic ToF and Stereo Data Fusion Based on Mixed Pixels Measurement Models , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[25]  Guido C. H. E. de Croon,et al.  Fusion of Stereo and Still Monocular Depth Estimates in a Self-Supervised Learning Context , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luigi di Stefano,et al.  Real-Time Self-Adaptive Deep Stereo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[29]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Kuk-Jin Yoon,et al.  Leveraging stereo matching with learning-based confidence measures , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[32]  Liang Wang,et al.  A Deep Visual Correspondence Embedding Model for Stereo Matching Costs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Daniel Cremers,et al.  Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Luigi di Stefano,et al.  Unsupervised Adaptation for Deep Stereo , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Zhidong Deng,et al.  SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[37]  Nikos Komodakis,et al.  Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Gianluca Agresti,et al.  Deep Learning for Confidence Information in Stereo and ToF Data Fusion , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[39]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).