DeepFacade: A Deep Learning Approach to Facade Parsing With Symmetric Loss

Parsing building facades into procedural grammars plays an important role for 3D building model generation tasks, which have been long desired in computer vision. Deep learning is a promising approach to facade parsing, however, a straightforward solution by directly applying standard deep learning approaches cannot always yield the optimal results. This is primarily due to two reasons: 1) it is nontrivial to train existing semantic segmentation networks for facade parsing, e.g., Fully-Convolutional Neural Networks (FCN) which are usually weak at predicting fine-grained shapes (J. Long et al., 2015); and 2) building facades are man-made architectures with highly regularized shape priors, and the prior knowledge plays an important role in facade parsing, for which how to integrate the prior knowledge into deep neural networks remains an open problem. In this paper, we present a novel symmetric loss function that can be used in deep neural networks for end-to-end training. This novel loss is based on the assumption that most of windows and doors have a highly symmetric rectangle shape, and it penalizes all window predictions that are non-rectangles. This prior knowledge is smoothly integrated into the end-to-end training process. Quantitative evaluation demonstrates that our method has outperformed previous state-of-art methods significantly on five popular facade parsing datasets. Qualitative results have shown that our method effectively aids deep convolutional neural networks to predict more accurate, visually pleasing, and symmetric shapes. To the best of our knowledge, we are the first to incorporate symmetry constraint into end-to-end training in deep neural networks for facade parsing.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Claus Brenner,et al.  Reconstruction of Façade Structures Using a Formal Grammar and RjMCMC , 2006, DAGM-Symposium.

[3]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shuicheng Yan,et al.  Clothes Co-Parsing Via Joint Image Segmentation and Labeling With Application to Clothing Retrieval , 2016, IEEE Transactions on Multimedia.

[6]  Yanxi Liu,et al.  Symmetry-Aware Façade Parsing with Occlusions , 2017, 2017 International Conference on 3D Vision (3DV).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Peter Wonka,et al.  Facade Segmentation in the Wild , 2018, ArXiv.

[9]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Renaud Marlet,et al.  Beyond Procedural Facade Parsing: Bidirectional Alignment via Linear Programming , 2014, ACCV.

[11]  Georgios Tziritas,et al.  Single view reconstruction using shape grammars for urban environments , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Gang Wang,et al.  Multimodal Recurrent Neural Networks With Information Transfer Layers for Indoor Scene Labeling , 2018, IEEE Transactions on Multimedia.

[13]  Radim Sára,et al.  Spatial Pattern Templates for Recognition of Objects with Regular Structure , 2013, GCPR.

[14]  Kun Zhou,et al.  Interactive images , 2012, ACM Trans. Graph..

[15]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Feng Han,et al.  Bottom-Up/Top-Down Image Parsing with Attribute Grammar , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Iasonas Kokkinos,et al.  Shape grammar parsing via Reinforcement Learning , 2011, CVPR 2011.

[19]  Loong Fah Cheong,et al.  Symmetric architecture modeling with a single image , 2009, ACM Trans. Graph..

[20]  Daniel Cohen-Or,et al.  Layered analysis of irregular facades via symmetry maximization , 2013, ACM Trans. Graph..

[21]  Frédo Durand,et al.  A gentle introduction to bilateral filtering and its applications , 2007, SIGGRAPH Courses.

[22]  Andreas Wendel,et al.  Façade Segmentation in a Multi-view Scenario , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[23]  Dong-Ming Yan,et al.  Symmetrization of facade layouts , 2016, Graph. Model..

[24]  Wolfgang Förstner,et al.  eTRIMS Image Database for Interpreting Images of Man-Made Scenes , 2009 .

[25]  Renaud Marlet,et al.  A MRF shape prior for facade parsing with occlusions , 2015, CVPR.

[26]  Daniel Cohen-Or,et al.  3-Sweep , 2013, ACM Trans. Graph..

[27]  Helmut Mayer,et al.  A CONVOLUTIONAL NETWORK FOR SEMANTIC FACADE SEGMENTATION AND INTERPRETATION , 2016 .

[28]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[29]  Luc Van Gool,et al.  ATLAS: A Three-Layered Approach to Facade Parsing , 2016, International Journal of Computer Vision.

[30]  Yunchao Wei,et al.  Multistage Object Detection With Group Recursive Learning , 2018, IEEE Transactions on Multimedia.

[31]  Marc Pollefeys,et al.  Efficient Structured Parsing of Facades Using Dynamic Programming , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Horst Bischof,et al.  Unsupervised Facade Segmentation Using Repetitive Patterns , 2010, DAGM-Symposium.

[33]  L. Van Gool,et al.  AUTOMATIC ARCHITECTURAL STYLE RECOGNITION , 2012 .

[34]  Hayko Riemenschneider,et al.  Irregular lattices for complex shape grammar facade parsing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Huanbo Luan,et al.  Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization , 2017, ACL.

[36]  Peter V. Gehler,et al.  Efficient 2D and 3D Facade Segmentation Using Auto-Context , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Daniel G. Aliaga,et al.  Procedural Modeling of a Building from a Single Image , 2018, Comput. Graph. Forum.

[38]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wolfgang Förstner,et al.  Regionwise Classification of Building Facade Images , 2011, PIA.

[40]  Nikos Paragios,et al.  Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Nikos Paragios,et al.  Learning Grammars for Architecture-Specific Facade Parsing , 2016, International Journal of Computer Vision.

[43]  George Stiny,et al.  Pictorial and Formal Aspects of Shape and Shape Grammars , 1975 .

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Luc Van Gool,et al.  Image-based procedural modeling of facades , 2007, ACM Trans. Graph..

[46]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[47]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[49]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[50]  Qinping Zhao,et al.  Rectilinear parsing of architecture in urban environment , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Steven C. H. Hoi,et al.  DeepFacade: A Deep Learning Approach to Facade Parsing With Symmetric Loss , 2017, IEEE Transactions on Multimedia.