Cross Field-Based Segmentation and Learning-Based Vectorization for Rectangular Windows

Detection and vectorization of windows from building façades are important for building energy modeling, civil engineering, and architecture design. However, current applications still face the challenges of low accuracy and lack of automation. In this article we propose a new two-steps workflow for window segmentation and vectorization from façade images. First, we propose a cross field learning-based neural network architecture, which is augmented by a grid-based self-attention module for window segmentation from rectified façade images, resulting in pixel-wise window blobs. Second, we propose a regression neural network augmented by squeeze-and-excitation (SE) attention blocks for window vectorization. The network takes the segmentation results together with the original façade image as input, and directly outputs the position of window corners, resulting in vectorized window objects with improved accuracy. In order to validate the effectiveness of our method, experiments are carried out on four public façades image datasets, with results usually yielding a higher accuracy for the final window prediction in comparison to baseline methods on four datasets in terms of intersection over union score, F1 score, and pixel accuracy.

[1]  Wei Ma,et al.  Deep Facade Parsing with Occlusions , 2022, KSII Trans. Internet Inf. Syst..

[2]  Hao Li,et al.  DeepWindows: Windows Instance Segmentation through an Improved Mask R-CNN Using Spatial Attention and Relation Modules , 2022, ISPRS Int. J. Geo Inf..

[3]  H. Zha,et al.  Progressive Feature Learning for Facade Parsing With Occlusions , 2022, IEEE Transactions on Image Processing.

[4]  Limao Zhang,et al.  Deep learning for detecting building façade elements from images considering prior knowledge , 2022, Automation in Construction.

[5]  J. Chanussot,et al.  Convolutional Neural Networks for Multimodal Remote Sensing Data Classification , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Hongchao Fan,et al.  Enhanced Facade Parsing for Street-Level Images Using Convolutional Neural Networks , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Samuel Letellier-Duchesne,et al.  A method for using street view imagery to auto-extract window-to-wall ratios and its relevance for urban-level daylighting and energy simulations , 2021, Building and Environment.

[8]  Timothy A. Warner,et al.  Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies - Part 1: Literature Review , 2021, Remote. Sens..

[9]  Justin Solomon,et al.  Polygonal Building Extraction by Frame Field Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hongbin Zha,et al.  Pyramid ALKNet for Semantic Parsing of Building Facade Image , 2021, IEEE Geoscience and Remote Sensing Letters.

[11]  Martin Mayfield,et al.  Residential building facade segmentation in the urban environment , 2021, Building and Environment.

[12]  Pieter Abbeel,et al.  Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Naoto Yokoya,et al.  More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Lianru Gao,et al.  Graph Convolutional Networks for Hyperspectral Image Classification , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[15]  F. Fraundorfer,et al.  Machine-learned Regularization and Polygonization of Building Segmentation Masks , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[16]  Wenjun Ma,et al.  Architectural Facade Recognition and Generation through Generative Adversarial Networks , 2020, 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE).

[17]  Demetri Terzopoulos,et al.  End-to-End Trainable Deep Active Contour Models for Automated Image Segmentation: Delineating Buildings in Aerial Imagery , 2020, ECCV.

[18]  Yuan-Qing Zhang,et al.  Window Detection in Facades Using Heatmap Fusion , 2020, Journal of Computer Science and Technology.

[19]  Zheng Zhang,et al.  Disentangled Non-Local Neural Networks , 2020, ECCV.

[20]  Wei Ma,et al.  Deep Window Detection in Street Scenes , 2020, KSII Trans. Internet Inf. Syst..

[21]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Steven C. H. Hoi,et al.  DeepFacade: A Deep Learning Approach to Facade Parsing With Symmetric Loss , 2017, IEEE Transactions on Multimedia.

[23]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Alexey Artemov,et al.  Learning to Approximate Directional Fields Defined over 2D Planes , 2019, AIST.

[26]  Jan Dirk Wegner,et al.  Topological Map Extraction From Overhead Images , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Naoto Yokoya,et al.  An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing , 2018, IEEE Transactions on Image Processing.

[28]  Mikhail Bessmeltsev,et al.  Vectorization of Line Drawings via Polyvector Fields , 2018, ACM Trans. Graph..

[29]  Peter Wonka,et al.  Facade Segmentation in the Wild , 2018, ArXiv.

[30]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[31]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[32]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Yichen Wei,et al.  Learning Region Features for Object Detection , 2018, ECCV.

[34]  Gang Wang,et al.  Multimodal Recurrent Neural Networks With Information Transfer Layers for Indoor Scene Labeling , 2018, IEEE Transactions on Multimedia.

[35]  Philip H. S. Torr,et al.  Learn To Pay Attention , 2018, ICLR.

[36]  Alexander Martin,et al.  Window detection in facade images for risk assessment in tunneling , 2018 .

[37]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Helmut Mayer,et al.  Facade Segmentation with a Structured Random Forest , 2017 .

[40]  Sanja Fidler,et al.  Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Convolutional Networks for Efficient Prediction , 2017, ArXiv.

[42]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Hao Liu,et al.  A hierarchical methodology for urban facade parsing from TLS point clouds , 2017 .

[44]  Markus König,et al.  Image-based window detection: an overview , 2016 .

[45]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[46]  Nikos Paragios,et al.  Learning Grammars for Architecture-Specific Facade Parsing , 2016, International Journal of Computer Vision.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Luc Van Gool,et al.  ATLAS: A Three-Layered Approach to Facade Parsing , 2016, International Journal of Computer Vision.

[49]  Renaud Marlet,et al.  A MRF shape prior for facade parsing with occlusions , 2015, CVPR.

[50]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[51]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[53]  Olga Sorkine-Hornung,et al.  Designing N‐PolyVector Fields with Complex Polynomials , 2014, Comput. Graph. Forum.

[54]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[55]  Radim Sára,et al.  Spatial Pattern Templates for Recognition of Objects with Regular Structure , 2013, GCPR.

[56]  Bruno Lévy,et al.  Quad‐Mesh Generation and Processing: A Survey , 2013, Comput. Graph. Forum.

[57]  Hayko Riemenschneider,et al.  Irregular lattices for complex shape grammar facade parsing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Pierre Alliez,et al.  An Optimal Transport Approach to Robust Reconstruction and Simplification of 2d Shapes , 2022 .

[59]  Qinping Zhao,et al.  Rectilinear parsing of architecture in urban environment , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  Nikos Paragios,et al.  Segmentation of building facades using procedural shape priors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61]  Feng Han,et al.  Bottom-Up/Top-Down Image Parsing with Attribute Grammar , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Bruno Lévy,et al.  Geometry-aware direction field processing , 2009, TOGS.

[63]  D. Bommes,et al.  Mixed-integer quadrangulation , 2009, ACM Trans. Graph..

[64]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Luc Van Gool,et al.  Image-based procedural modeling of facades , 2007, ACM Trans. Graph..

[66]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[67]  Aaron Hertzmann,et al.  Illustrating smooth surfaces , 2000, SIGGRAPH.

[68]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[70]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .