Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling

We propose an adaptive multi-resolution formulation of semantic 3D reconstruction. Given a set of images of a scene, semantic 3D reconstruction aims to densely reconstruct both the 3D shape of the scene and a segmentation into semantic object classes. Jointly reasoning about shape and class allows one to take into account class-specific shape priors (e.g., building walls should be smooth and vertical, and vice versa smooth, vertical surfaces are likely to be building walls), leading to improved reconstruction results. So far, semantic 3D reconstruction methods have been limited to small scenes and low resolution, because of their large memory footprint and computational cost. To scale them up to large scenes, we propose a hierarchical scheme which refines the reconstruction only in regions that are likely to contain a surface, exploiting the fact that both high spatial resolution and high numerical precision are only required in those regions. Our scheme amounts to solving a sequence of convex optimizations while progressively removing constraints, in such a way that the energy, in each iteration, is the tightest possible approximation of the underlying energy at full resolution. In our experiments the method saves up to 98% memory and 95% computation time, without any loss of accuracy.

[1]  Eitan Grinspun,et al.  CHARMS: a simple framework for adaptive simulation , 2002, ACM Trans. Graph..

[2]  Silvio Savarese,et al.  Dense Object Reconstruction with Semantic Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  John W. Fisher,et al.  Semantically-Aware Aerial Reconstruction from Multi-modal Data , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Daniel Cremers,et al.  Multiview Stereo and Silhouette Consistency via Convex Functionals over Convex Domains , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ilya Kostrikov,et al.  Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2011, International Journal of Computer Vision.

[7]  Daniel Cremers,et al.  Tight convex relaxations for vector-valued labeling problems , 2011, 2011 International Conference on Computer Vision.

[8]  William L. Briggs,et al.  A multigrid tutorial, Second Edition , 2000 .

[9]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Florent Lafarge,et al.  A Hybrid Multiview Stereo Algorithm for Modeling Urban Scenes , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  S. McCormick,et al.  A multigrid tutorial (2nd ed.) , 2000 .

[12]  Shubao Liu,et al.  Ray Markov Random Fields for image-based 3D modeling: Model and efficient inference , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[14]  Jean-Philippe Pons,et al.  Efficient Multi-View Reconstruction of Large-Scale Scenes using Interest Points, Delaunay Triangulation and Graph Cuts , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Tomás Pajdla,et al.  Multi-view reconstruction preserving weakly-supported surfaces , 2011, CVPR 2011.

[16]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[17]  Gabriel Taubin,et al.  SSD: Smooth Signed Distance Surface Reconstruction , 2011, Comput. Graph. Forum.

[18]  Florent Lafarge,et al.  Creating Large-Scale City Models from 3D-Point Clouds: A Robust Approach with Hybrid Representation , 2012, International Journal of Computer Vision.

[19]  Daniel Cremers,et al.  Fast and Accurate Large-Scale Stereo Reconstruction Using Variational Methods , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[20]  Antonin Chambolle,et al.  Diagonal preconditioning for first order primal-dual algorithms in convex optimization , 2011, 2011 International Conference on Computer Vision.

[21]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  R. Dykstra,et al.  A Method for Finding Projections onto the Intersection of Convex Sets in Hilbert Spaces , 1986 .

[23]  C. Mallet,et al.  AIRBORNE LIDAR FEATURE SELECTION FOR URBAN CLASSIFICATION USING RANDOM FORESTS , 2009 .

[24]  Daniel Cremers,et al.  Fast Joint Estimation of Silhouettes and Dense 3D Geometry from Multiple Images , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Patrick Pérez,et al.  Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Balázs Kégl,et al.  MULTIBOOST: A Multi-purpose Boosting Package , 2012, J. Mach. Learn. Res..

[27]  Horst Bischof,et al.  A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Daniel Cremers,et al.  A convex relaxation approach for computing minimal partitions , 2009, CVPR.

[29]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[30]  Marc Pollefeys,et al.  Discrete optimization of ray potentials for semantic 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jos B. T. M. Roerdink,et al.  An Efficient Algorithm to Calculate the Minkowski Sum of Convex 3D Polyhedra , 2001, International Conference on Computational Science.

[32]  Aseem Agarwala,et al.  Efficient gradient-domain compositing using quadtrees , 2007, ACM Trans. Graph..

[33]  Marc Pollefeys,et al.  What Is Optimized in Convex Relaxations for Multilabel Problems: Connecting Discrete and Continuously Inspired MAP Inference , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Michael M. Kazhdan,et al.  Unconstrained isosurface extraction on arbitrary octrees , 2007, Symposium on Geometry Processing.

[36]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[37]  Leif Kobbelt,et al.  Hierarchical Volumetric Multi-view Stereo Reconstruction of Manifold Surfaces based on Dual Graph Embedding , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  Randal C. Burns,et al.  Multilevel streaming for out-of-core surface reconstruction , 2007, Symposium on Geometry Processing.

[39]  Jan-Michael Frahm,et al.  Piecewise planar and non-planar stereo for urban scene reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Thomas Lewiner,et al.  Fast Generation of Pointerless Octree Duals , 2010, Comput. Graph. Forum.

[41]  William E. Lorensen,et al.  Marching cubes: a high resolution 3D surface construction algorithm , 1996 .

[42]  C. Zach Fast and High Quality Fusion of Depth Maps , 2008 .