Learning to refine depth for robust stereo estimation

Abstract Traditional depth estimation from stereo images is usually formulated as a patch-matching problem, which requires post-processing stages to impose smoothness and handle depth discontinuities and occlusions. While recent deep network approaches directly learn a regressor for the entire disparity map, they still suffer from large errors near the depth discontinuities. In this paper, we propose a novel method to refine the disparity maps generated by deep regression networks. Instead of relying on ad hoc post-processing, we learn a unified deep network model that predicts a confidence map and the disparity gradients from the learned feature representation in regression networks. We integrate the initial disparity estimation, the confidence map and the disparity gradients into a continuous Markov Random Field (MRF) for depth refinement, which is capable of representing rich surface structures. Our disparity MRF model can be solved via efficient global optimization in a closed form. We evaluate our approach on both synthetic and real-world datasets, and the results show it achieves the state-of-art performance and produces more structure-preserving disparity maps with smaller errors in the neighborhood of depth boundaries.

[1]  Marc Pollefeys,et al.  Direction matters: Depth estimation with a surface normal classifier , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Qi Zhang,et al.  100+ Times Faster Weighted Median Filter (WMF) , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Joost van de Weijer,et al.  Accurate Stereo Matching by Two-Step Energy Minimization , 2015, IEEE Transactions on Image Processing.

[5]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Andrew W. Fitzgibbon,et al.  Global stereo reconstruction under second order smoothness priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[12]  Rahul Nair,et al.  Ensemble Learning for Confidence Measures in Stereo Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[15]  Horst Bischof,et al.  Pushing the limits of stereo using variational stereo estimation , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[16]  Stefano Mattoccia,et al.  Learning from scratch a confidence measure , 2016, BMVC.

[17]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Enhua Wu,et al.  Constant Time Weighted Median Filtering for Stereo Matching and Beyond , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Heiko Hirschmüller,et al.  Evaluation of Stereo Matching Costs on Images with Radiometric Differences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Liang Wang,et al.  A Deep Visual Correspondence Embedding Model for Stereo Matching Costs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Konrad Schindler,et al.  Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[23]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[24]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Marc Pollefeys,et al.  Patch Based Confidence Prediction for Dense Disparity Map , 2016, BMVC.

[26]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[27]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[28]  Horst Bischof,et al.  Using Self-Contradiction to Learn Confidence Measures in Stereo Vision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Nikos Komodakis,et al.  Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[31]  Dani Lischinski,et al.  Colorization using optimization , 2004, SIGGRAPH 2004.

[32]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[34]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kuk-Jin Yoon,et al.  Leveraging stereo matching with learning-based confidence measures , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Richard Szeliski,et al.  Sampling the disparity space image , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[39]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[41]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..