PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images

Most state-of-the-art instance segmentation methods produce binary segmentation masks, however, geographic and cartographic applications typically require precise vector polygons of extracted objects instead of rasterized output. This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons. The model predicts the connection strength between each pair of vertices using a graph neural network and estimates the assignments by solving a differentiable optimal transport problem. Moreover, the vertex positions are optimized by minimizing a combined segmentation and polygonal angle difference loss. PolyWorld significantly outperforms the state-of-the-art in building polygonization and achieves not only notable quantitative results, but also produces visually pleasing building polygons. Code and trained weights will be soon available on github.

[1]  Yuri Boykov,et al.  Normalized Cut Loss for Weakly-Supervised CNN Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[3]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[5]  Min Bai,et al.  Learning Deep Structured Active Contours End-to-End , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Yoonseok Jwa,et al.  AN IMPLICIT REGULARIZATION FOR 3D BUILDING ROOFTOP MODELING USING AIRBORNE LIDAR DATA , 2012 .

[7]  Sergey I. Nikolenko,et al.  Building Detection from Satellite Imagery Using a Composite Loss Function , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Friedrich Fraundorfer,et al.  Machine-learned Regularization and Polygonization of Building Segmentation Masks , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[10]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Ismail Ben Ayed,et al.  On Regularized Losses for Weakly-supervised CNN Segmentation , 2018, ECCV.

[12]  Tomasz Malisiewicz,et al.  SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Alexey Shvets,et al.  TernausNetV2: Fully Convolutional Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[17]  Pierre Alliez,et al.  Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[18]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[19]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[21]  Friedrich Fraundorfer,et al.  Regularization of Building Boundaries in Satellite Images Using Adversarial and Regularized Losses , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[22]  Sanja Fidler,et al.  DARNet: Deep Active Ray Network for Building Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Sanja Fidler,et al.  Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yang Wang,et al.  Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation , 2016, ISVC.

[25]  C. Fraser,et al.  Automatic Detection of Residential Buildings Using LIDAR Data and Multispectral Imagery , 2010 .

[26]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[27]  Justin Solomon,et al.  Polygonal Building Extraction by Frame Field Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sébastien Ourselin,et al.  Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , 2017, DLMIA/ML-CDS@MICCAI.

[29]  Yuwen Xiong,et al.  PolyTransform: Deep Polygon Transformer for Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  I. Dowman,et al.  Data fusion of high-resolution satellite imagery and LiDAR data for automatic building extraction * , 2007 .

[31]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[32]  Vincent Lepetit,et al.  MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans , 2021, ArXiv.

[33]  Yifan Wu,et al.  Quantization in Relative Gradient Angle Domain For Building Polygon Estimation , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[34]  Md Zahangir Alom,et al.  Recurrent residual U-Net for medical image segmentation , 2019, Journal of medical imaging.

[35]  Jaewook Jung,et al.  Building Extraction from Satellite Images Using Mask R-CNN with Building Boundary Regularization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Sanja Fidler,et al.  Fast Interactive Object Annotation With Curve-GCN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Alexey Shvets,et al.  TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation , 2018, Computer-Aided Analysis of Gastrointestinal Videos.

[38]  Shuhei Hikosaka,et al.  Building Detection from Satellite Imagery using Ensemble of Size-Specific Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[40]  Jan Dirk Wegner,et al.  Topological Map Extraction From Overhead Images , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).