Efficient Deep Learning for Stereo Matching

In the past year, convolutional neural networks have been shown to perform extremely well for stereo estimation. However, current architectures rely on siamese networks which exploit concatenation followed by further processing layers, requiring a minute of GPU computation per image pair. In contrast, in this paper we propose a matching network which is able to produce very accurate results in less than a second of GPU computation. Towards this goal, we exploit a product layer which simply computes the inner product between the two representations of a siamese architecture. We train our network by treating the problem as multi-class classification, where the classes are all possible disparities. This allows us to get calibrated scores, which result in much better matching performance when compared to existing approaches.

[1]  Carlo Tomasi,et al.  Multiway cut for stereo and motion with slanted surfaces , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Hai Tao,et al.  A method for learning matching errors for stereo computation , 2004, BMVC.

[3]  Margrit Gelautz,et al.  A layered stereo matching algorithm using image segmentation and global visibility constraints , 2005 .

[4]  Andreas Klaus,et al.  Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  Hai Tao,et al.  Stereo Matching via Learning Multiple Experts Behaviors , 2006, BMVC.

[6]  Li Zhang,et al.  Estimating Optimal Parameters for MRF Stereo from a Single Image Pair , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Daniel P. Huttenlocher,et al.  Learning for stereo vision using the structured support vector machine , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Zhi-Gang Zheng,et al.  A region based stereo matching algorithm using cooperative optimization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11]  Pushmeet Kohli,et al.  Surface stereo with soft segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Pushmeet Kohli,et al.  Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[13]  Tamir Hazan,et al.  Continuous Markov Random Fields for Robust Stereo Estimation , 2012, ECCV.

[14]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Carsten Rother,et al.  Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images , 2012, ECCV.

[16]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Rahul Nair,et al.  Ensemble Learning for Confidence Measures in Stereo Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Raquel Urtasun,et al.  Robust Monocular Epipolar Flow Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Nikos Komodakis,et al.  Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[22]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Marc Pollefeys,et al.  Direction matters: Depth estimation with a surface normal classifier , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Liang Wang,et al.  A Deep Visual Correspondence Embedding Model for Stereo Matching Costs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Julian Eggert,et al.  A multi-block-matching approach for stereo , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[28]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..