Low-level vision by consensus in a spatial hierarchy of regions

We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data-such as depth from stereo image pairs. The framework uses a dense, overlapping set of image regions at multiple scales and a “local model,” such as a slanted-plane model for stereo disparity, that is expected to be valid piecewise across the visual field. Estimation is cast as optimization over a dichotomous mixture of variables, simultaneously determining which regions are inliers with respect to the local model (binary variables) and the correct co-ordinates in the local model space for each inlying region (continuous variables). When the regions are organized into a multi-scale hierarchy, optimization can occur in an efficient and parallel architecture, where distributed computational units iteratively perform calculations and share information through sparse connections between parents and children. The framework performs well on a standard benchmark for binocular stereo, and it produces a distributional scene representation that is appropriate for combining with higher-level reasoning and other low-level cues.

[1]  William T. Freeman,et al.  A Data-Driven Regularization Model for Stereo and Flow , 2014, 2014 2nd International Conference on 3D Vision.

[2]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Raquel Urtasun,et al.  Robust Monocular Epipolar Flow Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Joachim Weickert,et al.  Optic Flow Goes Stereo: A Variational Method for Estimating Discontinuity-Preserving Dense Disparity Maps , 2005, DAGM-Symposium.

[5]  Konrad Schindler,et al.  Piecewise Rigid Scene Flow , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Timo Kohlberger,et al.  A Multigrid Platform for Real-Time Motion Computation with Discontinuity-Preserving Variational Methods , 2006, International Journal of Computer Vision.

[7]  Julian Eggert,et al.  Block-matching stereo with relaxed fronto-parallel assumption , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[8]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Michael J. Brooks,et al.  The variational approach to shape from shading , 1986, Comput. Vis. Graph. Image Process..

[10]  Carlo Tomasi,et al.  Multiway cut for stereo and motion with slanted surfaces , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Daniel Cremers,et al.  Fast and Accurate Large-Scale Stereo Reconstruction Using Variational Methods , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[12]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Tamir Hazan,et al.  Continuous Markov Random Fields for Robust Stereo Estimation , 2012, ECCV.

[14]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Carsten Rother,et al.  FusionFlow: Discrete-continuous optimization for optical flow estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[17]  Horst Bischof,et al.  Minimizing TGV-Based Variational Models with Non-convex Data Terms , 2013, SSVM.

[18]  Andrew W. Fitzgibbon,et al.  Global stereo reconstruction under second order smoothness priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[20]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[21]  Karl Kunisch,et al.  Total Generalized Variation , 2010, SIAM J. Imaging Sci..

[22]  Ralph D Freeman,et al.  Functional connectivity of disparity-tuned neurons in the visual cortex. , 2004, Journal of neurophysiology.

[23]  In-So Kweon,et al.  Adaptive Support-Weight Approach for Correspondence Search , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  K P HornBerthold,et al.  The variational approach to shape from shading , 1986 .

[26]  Lifeng Sun,et al.  Cross-Scale Cost Aggregation for Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[28]  Raúl Rojas,et al.  Weighted Semi-Global Matching and Center-Symmetric Census Transform for Robust Driver Assistance , 2013, CAIP.

[29]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[30]  Takeo Kanade,et al.  A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Ronen Basri,et al.  From Shading to Local Shape , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.