Beyond Local Reasoning for Stereo Confidence Estimation with Deep Learning

Confidence measures for stereo gained popularity in recent years due to their improved capability to detect outliers and the increasing number of applications exploiting these cues. In this field, convolutional neural networks achieved top-performance compared to other known techniques in the literature by processing local information to tell disparity assignments from outliers. Despite this outstanding achievements, all approaches rely on clues extracted with small receptive fields thus ignoring most of the overall image content. Therefore, in this paper, we propose to exploit nearby and farther clues available from image and disparity domains to obtain a more accurate confidence estimation. While local information is very effective for detecting high frequency patterns, it lacks insights from farther regions in the scene. On the other hand, enlarging the receptive field allows to include clues from farther regions but produces smoother uncertainty estimation, not particularly accurate when dealing with high frequency patterns. For these reasons, we propose in this paper a multi-stage cascaded network to combine the best of the two worlds. Extensive experiments on three datasets using three popular stereo algorithms prove that the proposed framework outperforms state-of-the-art confidence estimation techniques.

[1]  Stefano Mattoccia,et al.  Deep Stereo Fusion: Combining Multiple Disparity Hypotheses with Deep-Learning , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[2]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[3]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[4]  Stefano Mattoccia,et al.  Efficient Confidence Measures for Embedded Stereo , 2017, ICIAP.

[5]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[6]  Rahul Nair,et al.  Ensemble Learning for Confidence Measures in Stereo Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Stefano Mattoccia,et al.  Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures , 2016, ECCV.

[9]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Marc Pollefeys,et al.  Patch Based Confidence Prediction for Dense Disparity Map , 2016, BMVC.

[11]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Stefano Mattoccia,et al.  Even More Confident Predictions with Deep Machine-Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Stefano Mattoccia,et al.  Quantitative Evaluation of Confidence Measures in a Machine Learning World , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Luigi di Stefano,et al.  Learning confidence measures in the wild , 2017, BMVC.

[16]  Lior Wolf,et al.  Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Luigi di Stefano,et al.  Unsupervised Adaptation for Deep Stereo , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Ajmal S. Mian,et al.  Face Recognition Using Sparse Approximated Nearest Points between Image Sets , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Seungryong Kim,et al.  Feature Augmentation for Learning Confidence Measure in Stereo Matching , 2017, IEEE Transactions on Image Processing.

[20]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[22]  Kuk-Jin Yoon,et al.  Leveraging stereo matching with learning-based confidence measures , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nikos Komodakis,et al.  Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mohsen Ardabilian,et al.  Stereo Matching Confidence Learning Based on Multi-modal Convolution Neural Networks , 2017, Representations, Analysis and Recognition of Shape and Motion from Imaging Data.

[25]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Horst Bischof,et al.  Using Self-Contradiction to Learn Confidence Measures in Stereo Vision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Stefano Mattoccia,et al.  Learning to Predict Stereo Reliability Enforcing Local Consistency of Confidence Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[29]  Stefano Mattoccia,et al.  Learning from scratch a confidence measure , 2016, BMVC.

[30]  Stefano Mattoccia,et al.  Learning a General-Purpose Confidence Measure Based on O(1) Features and a Smarter Aggregation Strategy for Semi Global Matching , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[31]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.