LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation

We present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. The proposed network, termed as Locally Adaptive Fusion Networks (LAF-Net), learns locally-varying attention and scale maps to fuse the tri-modal confidence features. The attention inference networks encode the importance of tri-modal confidence features and then concatenate them using the attention maps in an adaptive and dynamic fashion. This enables us to make an optimal fusion of the heterogeneous features, compared to a simple concatenation technique that is commonly used in conventional approaches. In addition, to encode the confidence features with locally-varying receptive fields, the scale inference networks learn the scale map and warp the fused confidence features through convolutional spatial transformer networks. Finally, the confidence map is progressively estimated in the recursive refinement networks to enforce a spatial context and local consistency. Experimental results show that this model outperforms the state-of-the-art methods on various benchmarks.

[1]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[2]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[3]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[4]  Stefano Mattoccia,et al.  Learning from scratch a confidence measure , 2016, BMVC.

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  Richard Szeliski,et al.  Stereo Matching with Nonlinear Diffusion , 1998, International Journal of Computer Vision.

[7]  Markus Vincze,et al.  A fast stereo matching algorithm suitable for embedded real-time systems , 2010, Comput. Vis. Image Underst..

[8]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Stefano Mattoccia,et al.  Beyond Local Reasoning for Stereo Confidence Estimation with Deep Learning , 2018, ECCV.

[10]  Seungryong Kim,et al.  Mahalanobis Distance Cross-Correlation for Illumination-Invariant Stereo Matching , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  In-So Kweon,et al.  Adaptive Support-Weight Approach for Correspondence Search , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[13]  Nikos Komodakis,et al.  Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Min-Gyu Park,et al.  Learning and Selecting Confidence Measures for Robust Stereo Matching. , 2019, IEEE transactions on pattern analysis and machine intelligence.

[15]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Marc Pollefeys,et al.  Patch Based Confidence Prediction for Dense Disparity Map , 2016, BMVC.

[18]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Horst Bischof,et al.  Using Self-Contradiction to Learn Confidence Measures in Stereo Vision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Geoffrey Egnal,et al.  A stereo confidence metric using single view imagery with comparison to five alternative approaches , 2004, Image Vis. Comput..

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Sang Uk Lee,et al.  Robust Stereo Matching Using Adaptive Normalized Cross-Correlation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[24]  Philippos Mordohai,et al.  The Self-Aware Matching Measure for stereo , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Rahul Nair,et al.  Ensemble Learning for Confidence Measures in Stereo Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[27]  Stefano Mattoccia,et al.  Quantitative Evaluation of Confidence Measures in a Machine Learning World , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Seungryong Kim,et al.  Unified Confidence Estimation Networks for Robust Stereo Matching , 2019, IEEE Transactions on Image Processing.

[29]  Sunok Kim,et al.  Feature Augmentation for Learning Confidence Measure in Stereo Matching. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[30]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Seungryong Kim,et al.  Deep stereo confidence prediction for depth estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[32]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Zehua Fu,et al.  Learning Confidence Measures by Multi-modal Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[34]  Jonathan M. Garibaldi,et al.  Real-Time Correlation-Based Stereo Vision with Reduced Border Errors , 2002, International Journal of Computer Vision.

[35]  Stephen Lin,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Lior Wolf,et al.  Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Stefano Mattoccia,et al.  Learning a General-Purpose Confidence Measure Based on O(1) Features and a Smarter Aggregation Strategy for Semi Global Matching , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[38]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[39]  Kuk-Jin Yoon,et al.  Leveraging stereo matching with learning-based confidence measures , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Stefano Mattoccia,et al.  Learning to Predict Stereo Reliability Enforcing Local Consistency of Confidence Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Xing Mei,et al.  On building an accurate stereo matching system on graphics hardware , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[42]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[43]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[44]  Geoffrey Egnal,et al.  Detecting Binocular Half-Occlusions: Empirical Comparisons of Five Approaches , 2002, IEEE Trans. Pattern Anal. Mach. Intell..