Unsupervised Stereo Matching Using Confidential Correspondence Consistency

Stereo matching aims to perceive the 3D geometric configuration of scenes and facilitates a variety of computer vision in advanced driver assistance systems (ADAS) applications. Recently, deep convolutional neural networks (CNNs) have shown dramatic performance improvements for computing the matching cost in the stereo matching. However, the performance of CNN-based approaches relies heavily on datasets, requiring a large number of ground truth data which needs tremendous works. To overcome this limitation, we present a novel framework to learn CNNs for matching cost computation in an unsupervised manner. Our method leverages an image domain learning combined with stereo epipolar constraints. By exploiting the correspondence consistency between stereo images, our method selects putative positive samples in each training iteration and utilizes them to train the networks. We further propose a positive sample propagation scheme to leverage additional training samples. Our unsupervised learning method is evaluated with two kinds of network architectures, simple and precise CNNs, and shows comparable performance to that of the state-of-the-art methods including both supervised and unsupervised learning approaches on KITTI, Middlebury, HCI, and Yonsei datasets. This extensive evaluation demonstrates that the proposed learning framework can be applied to deal with various real driving conditions.

[1]  Luigi di Stefano,et al.  Unsupervised Adaptation for Deep Stereo , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[3]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[4]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[5]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  In So Kweon,et al.  KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving , 2018, IEEE Transactions on Intelligent Transportation Systems.

[7]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[10]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Zhiyu Xiang,et al.  Perception in Disparity: An Efficient Navigation Framework for Autonomous Vehicles With Stereo Cameras , 2015, IEEE Transactions on Intelligent Transportation Systems.

[13]  Carlo Tomasi,et al.  A Pixel Dissimilarity Measure That Is Insensitive to Image Sampling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Minh N. Do,et al.  DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[22]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[23]  Vladimir Kolmogorov,et al.  Visual correspondence using energy minimization and mutual information , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Liang Wang,et al.  A Deep Visual Correspondence Embedding Model for Stereo Matching Costs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[27]  Hongdong Li,et al.  Self-Supervised Learning for Stereo Matching with Self-Improving Ability , 2017, ArXiv.

[28]  Stefano Mattoccia,et al.  Learning from scratch a confidence measure , 2016, BMVC.

[29]  François Fleuret,et al.  Weakly Supervised Learning of Deep Metrics for Stereo Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[31]  Heiko Hirschmüller,et al.  Evaluation of Stereo Matching Costs on Images with Radiometric Differences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Seungryong Kim,et al.  Unsupervised stereo matching using correspondence consistency , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[33]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[34]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Jae Wook Jeon,et al.  Robust Stereo Data Cost With a Learning Strategy , 2017, IEEE Transactions on Intelligent Transportation Systems.

[36]  Ying Wu,et al.  Large Displacement Optical Flow from Nearest Neighbor Fields , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Marc Pollefeys,et al.  SGM-Nets: Semi-Global Matching with Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[40]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Bernd Jähne,et al.  Outdoor stereo camera system for the generation of real-world benchmark data sets , 2012 .

[42]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Hui Xiong,et al.  A Unified Framework for Concurrent Pedestrian and Cyclist Detection , 2017, IEEE Transactions on Intelligent Transportation Systems.

[44]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[47]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[50]  Minh N. Do,et al.  Fast Global Image Smoothing Based on Weighted Least Squares , 2014, IEEE Transactions on Image Processing.

[51]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[52]  Minh N. Do,et al.  Probability-Based Rendering for View Synthesis , 2014, IEEE Transactions on Image Processing.

[53]  Hong Zhang,et al.  Unsupervised Learning of Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).