A robust three-stage approach to large-scale urban scene recognition

To obtain the ultimate high-level description of urban scenes, we propose a three-stage approach to recognizing the 3D reconstructed scene with efficient representations. First, we develop a joint semantic labeling method to obtain a semantic labeling of the triangular mesh-based representation by exploiting both image features and geometric features. The labeling is formulated over a conditional random field (CRF) that incorporates local spacial smoothness and multi-view consistency. Then, based on the labeled reconstructed meshes, we refine the man-made object segmentation in the recomposed global orthographic map with a graph partition algorithm, and propagate the coherent segmentation to the entire 3D meshes. Finally, we propose to generate a compact, abstracted geometric representation for each man-made object which is more visually appealing than the original cluttered models. This abstraction algorithm also leverages CRF formation to partition building footprints into minimal sets of structural linear features which are then used to construct profiles for large-scale scenes. The proposed recognition approach is able to robustly handle reconstructions with poor geometry and connectivity, thanks to the higher order CRF formulations which impose the ubiquitous regularity priors in urban scenes. Each stage performs an individual and uncoupling task. The intensive experiments have demonstrated the superior performance of our approach in robustness, accuracy and applicability.

[1]  Wolfgang Förstner,et al.  DETECTABILITY OF BUILDINGS IN AERIAL IMAGES OVER SCALE SPACE , 2006 .

[2]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  George Vosselman,et al.  Reconstruction of 3D building models from aerial images and maps , 2003 .

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  Marc Pierrot Deseilligny,et al.  3D Building Reconstruction with Parametric Roof Superstructures , 2007, 2007 IEEE International Conference on Image Processing.

[6]  Pushmeet Kohli,et al.  P3 & Beyond: Solving Energies with Higher Order Cliques , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Marc Pollefeys,et al.  Interactive 3D architectural modeling from unordered photo collections , 2008, SIGGRAPH 2008.

[8]  Long Quan,et al.  Image-Based Building Regularization Using Structural Linear Features , 2016, IEEE Transactions on Visualization and Computer Graphics.

[9]  J. Trinder,et al.  Automated delineation of roof planes from LIDAR data , 2005 .

[10]  Ulrich Neumann,et al.  2.5D building modeling by discovering global regularities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Richard Szeliski,et al.  Interactive 3D architectural modeling from unordered photo collections , 2008, ACM Trans. Graph..

[13]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[14]  Long Quan,et al.  A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  K. Kraus,et al.  Determination of terrain models in wooded areas with airborne laser scanner data , 1998 .

[16]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[17]  Martial Hebert,et al.  Contextual classification with functional Max-Margin Markov Networks , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Oscar C. Au,et al.  Automatic object segmentation from large scale 3D urban point clouds through manifold embedded mode seeking , 2011, ACM Multimedia.

[20]  Supun Samarasekera,et al.  Building segmentation for densely built urban regions using aerial LIDAR data , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Long Quan,et al.  Joint Segmentation of Images and Scanned Point Cloud in Large-Scale Street Scenes With Low-Annotation Cost , 2014, IEEE Transactions on Image Processing.

[22]  Keiichi Abe,et al.  Topological structural analysis of digitized binary images by border following , 1985, Comput. Vis. Graph. Image Process..

[23]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Vivek Verma,et al.  3D Building Detection and Modeling from Aerial LIDAR Data , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Josiane Zerubia,et al.  Structural Approach for Building Reconstruction from a Single DSM , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Long Quan,et al.  Higher-Order CRF Structural Segmentation of 3D Reconstructed Surfaces , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Harpreet S. Sawhney,et al.  Learning-based building outline detection from multiple aerial images , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  Reinhard Klein,et al.  Automatic generation of structural building descriptions from 3D point cloud scans , 2015, 2014 International Conference on Computer Graphics Theory and Applications (GRAPP).

[29]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Helmut Mayer,et al.  Automatic Object Extraction from Aerial Imagery - A Survey Focusing on Buildings , 1999, Comput. Vis. Image Underst..

[31]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Jianxiong Xiao,et al.  Supervised Label Transfer for Semantic Segmentation of Street Scenes , 2010, ECCV.

[33]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..