Location-Specific Embedding Learning for the Semantic Segmentation of Building Footprints on a Global Scale

In this paper, we analyze the feasability of learning a latent embedding space from aerial and satellite imagery in order to capture semantic properties of geographical locations. We show that deep neural network, trained with a triplet loss function, can be effectively used to obtain a location-specific embedding. Considering the problem of building footprint segmentation from aerial imagery of varying cities, we leverage these embeddings together with a clustering for the training of location-specific segmentation networks and the selection of the corresponding segmentation network during inference time. We evaluate our approach on the large-scale Inria Aerial Image Labeling Dataset which contains aerial images of globally distributed cities. Our approach achieves an outperformance against state-of-the-art approaches on the Intersection over Union metric for the building class over all cities and by more than 2% for specific cities.

[1]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andreas Dengel,et al.  Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks , 2017, 2019 IEEE International Conference on Image Processing (ICIP).

[5]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[6]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9]  Pierre Alliez,et al.  Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  S. Dwivedi,et al.  Obesity May Be Bad: Compressed Convolutional Networks for Biomedical Image Segmentation , 2020 .

[13]  Andreas Dengel,et al.  Towards a Sentinel-2 Based Human Settlement Layer , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Alexey Shvets,et al.  TernausNetV2: Fully Convolutional Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).