Learning Confidence Measures by Multi-modal Convolutional Neural Networks

In stereo matching, the correctness of stereo pairs matches, also called confidence, is used to improve the dense disparity estimation result. In this paper, we propose a multi-modal deep learning approach for stereo matching confidence estimation. The input of our method is composed of two modalities, the initial disparity maps, and its reference color image. To effectively combine these two modalities, we explore and study multiple convolutional neural network (CNN) structures for our specific confidence prediction tasks. To the best of our knowledge, this is the first approach reported in the literature combining multiple modalities and patch based deep learning to predict the confidence. The experiments on KITTI datasets demonstrate that our multi-modal confidence network can significantly outperform the state-of-the-art methods.

[1]  Zhihan Lv,et al.  Detecting ground control points via convolutional neural network for stereo matching , 2016, Multimedia Tools and Applications.

[2]  Stefano Mattoccia,et al.  Learning to Predict Stereo Reliability Enforcing Local Consistency of Confidence Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Stefano Mattoccia,et al.  Quantitative Evaluation of Confidence Measures in a Machine Learning World , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[6]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[7]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[10]  Rahul Nair,et al.  Ensemble Learning for Confidence Measures in Stereo Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[13]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Geoffrey Egnal,et al.  Detecting Binocular Half-Occlusions: Empirical Comparisons of Five Approaches , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Peter I. Corke,et al.  Quantitative Evaluation of Matching Methods and Validity Measures for Stereo Vision , 2001, Int. J. Robotics Res..

[16]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[17]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[18]  Kuk-Jin Yoon,et al.  Leveraging stereo matching with learning-based confidence measures , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Marc Pollefeys,et al.  Patch Based Confidence Prediction for Dense Disparity Map , 2016, BMVC.

[23]  Nikos Komodakis,et al.  Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Stefano Mattoccia,et al.  Even More Confident Predictions with Deep Machine-Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[26]  Philippos Mordohai,et al.  Correctness Prediction, Accuracy Improvement and Generalization of Stereo Matching Using Supervised Learning , 2015, International Journal of Computer Vision.

[27]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[28]  Stefano Mattoccia,et al.  Learning from scratch a confidence measure , 2016, BMVC.

[29]  Stefano Mattoccia,et al.  Learning a General-Purpose Confidence Measure Based on O(1) Features and a Smarter Aggregation Strategy for Semi Global Matching , 2016, 2016 Fourth International Conference on 3D Vision (3DV).