Stereo reconstruction using top-down cues

Scene understanding cues help stereo reconstruction.We integrate standard bottom-up reconstruction and top-down understanding.We demonstrate integration of top-down cues with 3 state-of-the-art approaches.We evaluate on Middlebury 2014 and KITTI and find gains up to 15%. Display Omitted We present a framework which allows standard stereo reconstruction to be unified with a wide range of classic top-down cues from urban scene understanding. The resulting algorithm is analogous to the human visual system where conflicting interpretations of the scene due to ambiguous data can be resolved based on a higher level understanding of urban environments. The cues which are reformulated within the framework include: recognising common arrangements of surface normals and semantic edges (e.g. concave, convex and occlusion boundaries), recognising connected or coplanar structures such as walls, and recognising collinear edges (which are common on repetitive structures such as windows). Recognition of these common configurations has only recently become feasible, thanks to the emergence of large-scale reconstruction datasets. To demonstrate the importance and generality of scene understanding during stereo-reconstruction, the proposed approach is integrated with 3 different state-of-the-art techniques for bottom-up stereo reconstruction. The use of high-level cues is shown to improve performance by up to 15% on the Middlebury 2014 and KITTI datasets. We further evaluate the technique using the recently proposed HCI stereo metrics, finding significant improvements in the quality of depth discontinuities, planar surfaces and thin structures.

[1]  Carsten Rother,et al.  Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images , 2012, ECCV.

[2]  Jiri Matas,et al.  Fixing the Locally Optimized RANSAC , 2012, BMVC.

[3]  James H. Elder,et al.  An Accurate Method for Line Detection and Manhattan Frame Estimation , 2012, ACCV Workshops.

[4]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jiri Matas,et al.  Texture-Independent Long-Term Tracking Using Virtual Corners , 2016, IEEE Transactions on Image Processing.

[6]  M. Arterberry,et al.  The Cradle of Knowledge: Development of Perception in Infancy , 1998 .

[7]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[8]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  3D Urban Scene Modeling Integrating Recognition and Reconstruction , 2008, International Journal of Computer Vision.

[10]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[11]  Ashutosh Saxena,et al.  Depth Estimation Using Monocular and Stereo Cues , 2007, IJCAI.

[12]  F. Dellaert,et al.  Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments , 2004, CVPR 2004.

[13]  Jana Kosecka,et al.  Multi-view Superpixel Stereo in Urban Environments , 2010, International Journal of Computer Vision.

[14]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Luc Van Gool,et al.  Shape-from-recognition: Recognition enables meta-data transfer , 2009, Computer Vision and Image Understanding.

[16]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[17]  J. Mixter Fast , 2012 .

[18]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Raúl Rojas,et al.  Weighted Semi-Global Matching and Center-Symmetric Census Transform for Robust Driver Assistance , 2013, CAIP.

[20]  Jean-Philippe Pons,et al.  Robust piecewise-planar 3D reconstruction and completion from large-scale unstructured point data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Jan Kautz,et al.  PMBP: PatchMatch Belief Propagation for Correspondence Field Estimation , 2014, International Journal of Computer Vision.

[22]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[23]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[24]  Reinhard Koch,et al.  Multi Viewpoint Stereo from Uncalibrated Video Sequences , 1998, ECCV.

[25]  Raquel Urtasun,et al.  Robust Monocular Epipolar Flow Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Ramin Zabih,et al.  A non-parametric approach to visual correspondence , 1996 .

[27]  Jan-Michael Frahm,et al.  Piecewise planar and non-planar stereo for urban scene reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Luc Van Gool,et al.  Superpixel meshes for fast edge-preserving surface reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  R. Batson Photogrammetry with surface-based images. , 1969, Applied optics.

[30]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[31]  Martin Humenberger,et al.  A census-based stereo vision algorithm using modified Semi-Global Matching and plane fitting to improve matching quality , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[32]  Daniel Cremers,et al.  Fast and Accurate Large-Scale Stereo Reconstruction Using Variational Methods , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[33]  Steven Haker,et al.  Stereo Disparity and L1 Minimization , 1997 .

[34]  James H. Elder,et al.  Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery , 2008, ECCV.

[35]  Luc Van Gool,et al.  Efficient edge-aware surface mesh reconstruction for urban scenes , 2017, Comput. Vis. Image Underst..

[36]  Richard Bowden,et al.  Exploiting High Level Scene Cues in Stereo Reconstruction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Ying Xiong,et al.  Low-level vision by consensus in a spatial hierarchy of regions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Herschel W. Leibowitz,et al.  Perception of space and motion : an international symposium , 1978 .

[40]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[42]  Andreas Geiger,et al.  Omnidirectional 3D reconstruction in augmented Manhattan worlds , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Lena Maier-Hein,et al.  The HCI Stereo Metrics: Geometry-Aware Performance Analysis of Stereo Algorithms , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Richard Bowden,et al.  Scene Flow Estimation using Intelligent Cost Functions , 2014, BMVC.

[45]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[46]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[47]  Pushmeet Kohli,et al.  Geometric Image Parsing in Man-Made Environments , 2010, ECCV.

[48]  Hongyang Chao,et al.  MeshStereo: A Global Stereo Model with Mesh Alignment Regularization for View Interpolation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Pushmeet Kohli,et al.  Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[50]  Martial Hebert,et al.  Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[51]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[52]  Reinhard Klette,et al.  Iterative Semi-Global Matching for Robust Driver Assistance Systems , 2012, ACCV.

[53]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Hongyang Chao,et al.  As-Rigid-As-Possible Stereo under Second Order Smoothness Priors , 2014, ECCV.

[55]  Martin D. Levine,et al.  Recovering parametric geons from multiview range data , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Enhua Wu,et al.  Constant Time Weighted Median Filtering for Stereo Matching and Beyond , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[58]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[59]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[60]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Alois Knoll,et al.  PM-Huber: PatchMatch with Huber Regularization for Stereo Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[62]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[63]  B. Julesz Binocular depth perception of computer-generated patterns , 1960 .

[64]  Horst Bischof,et al.  Minimizing TGV-Based Variational Models with Non-convex Data Terms , 2013, SSVM.

[65]  András Bódis-Szomorú,et al.  Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Superpixels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[67]  Martial Hebert,et al.  Unfolding an Indoor Origami World , 2014, ECCV.