3D all the way: Semantic segmentation of urban scenes from start to end in 3D

We propose a new approach for semantic segmentation of 3D city models. Starting from an SfM reconstruction of a street-side scene, we perform classification and facade splitting purely in 3D, obviating the need for slow image-based semantic segmentation methods. We show that a properly trained pure-3D approach produces high quality labelings, with significant speed benefits (20x faster) allowing us to analyze entire streets in a matter of minutes. Additionally, if speed is not of the essence, the 3D labeling can be combined with the results of a state-of-the-art 2D classifier, further boosting the performance. Further, we propose a novel facade separation based on semantic nuances between facades. Finally, inspired by the use of architectural principles for 2D facade labeling, we propose new 3D-specific principles and an efficient optimization scheme based on an integer quadratic programming formulation.

[1]  Chao Yang,et al.  Parsing façade with rank-one approximation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jitendra Malik,et al.  Parsing Images of Architectural Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Hans-Peter Seidel,et al.  A Correlated Parts Model for Object Detection in Large 3D Scans , 2013, Comput. Graph. Forum.

[4]  Qinping Zhao,et al.  Rectilinear parsing of architecture in urban environment , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  András Bódis-Szomorú,et al.  Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Superpixels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Luc Van Gool,et al.  A Three-Layered Approach to Facade Parsing , 2012, ECCV.

[8]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, SIGGRAPH 2010.

[11]  Luc Van Gool,et al.  Parameter-free/Pareto-driven procedural 3D reconstruction of buildings from ground-level sequences , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Luc Van Gool,et al.  Procedural modeling of buildings , 2006, ACM Trans. Graph..

[13]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[16]  Jianxiong Xiao,et al.  Image-based façade modeling , 2008, ACM Trans. Graph..

[17]  Vladimir G. Kim,et al.  Shape-based recognition of 3D point clouds in urban environments , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Radim Sára,et al.  Spatial Pattern Templates for Recognition of Objects with Regular Structure , 2013, GCPR.

[19]  Tomás Pajdla,et al.  Multi-view reconstruction preserving weakly-supported surfaces , 2011, CVPR 2011.

[20]  Hayko Riemenschneider,et al.  Irregular lattices for complex shape grammar facade parsing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Luc Van Gool,et al.  Learning Domain Knowledge for Façade Labelling , 2012, ECCV.

[22]  Marc Pollefeys,et al.  Efficient Structured Parsing of Facades Using Dynamic Programming , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jianxiong Xiao,et al.  Multiple view semantic segmentation for street view images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Federico Tombari,et al.  On the Affinity between 3D Detectors and Descriptors , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[25]  Frank Dellaert,et al.  A Probabilistic Approach to the Semantic Interpretation of Building Facades , 2004 .

[26]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[27]  Luc Van Gool,et al.  Superpixel meshes for fast edge-preserving surface reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ruigang Yang,et al.  Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[29]  Luc Van Gool,et al.  Image-based procedural modeling of facades , 2007, SIGGRAPH 2007.

[30]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Martial Hebert,et al.  Co-inference for Multi-modal Scene Analysis , 2012, ECCV.

[32]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[33]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[36]  Shi-Min Hu,et al.  Adaptive partitioning of urban facades , 2011, SA '11.

[37]  Luc Van Gool,et al.  Scene Cut: Class-Specific Object Detection and Segmentation in 3D Scenes , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[38]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[39]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[41]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[42]  Iasonas Kokkinos,et al.  Parsing Facades with Shape Grammars and Reinforcement Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Olaf Kähler,et al.  Efficient 3D Scene Labeling Using Fields of Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Renaud Marlet,et al.  Image parsing with graph grammars and Markov Random Fields applied to facade analysis , 2014, IEEE Winter Conference on Applications of Computer Vision.

[45]  Luc Van Gool,et al.  Bayesian Grammar Learning for Inverse Procedural Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Luc Van Gool,et al.  Learning Where to Classify in Multi-view Semantic Segmentation , 2014, ECCV.

[47]  Nikos Paragios,et al.  Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Zhengyou Zhang,et al.  Parameter estimation techniques: a tutorial with application to conic fitting , 1997, Image Vis. Comput..

[49]  Josiane Zerubia,et al.  Structural Approach for Building Reconstruction from a Single DSM , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Olga Sorkine-Hornung,et al.  Object detection and classification from large‐scale cluttered indoor scans , 2014, Comput. Graph. Forum.

[51]  Luc Van Gool,et al.  Depth-From-Recognition: Inferring Meta-data by Cognitive Feedback , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Horst Bischof,et al.  Unsupervised Facade Segmentation Using Repetitive Patterns , 2010, DAGM-Symposium.

[53]  Luc Van Gool,et al.  Is There a Procedural Logic to Architecture? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.