论文信息 - Location-Specific Embedding Learning for the Semantic Segmentation of Building Footprints on a Global Scale

Location-Specific Embedding Learning for the Semantic Segmentation of Building Footprints on a Global Scale

In this paper, we analyze the feasability of learning a latent embedding space from aerial and satellite imagery in order to capture semantic properties of geographical locations. We show that deep neural network, trained with a triplet loss function, can be effectively used to obtain a location-specific embedding. Considering the problem of building footprint segmentation from aerial imagery of varying cities, we leverage these embeddings together with a clustering for the training of location-specific segmentation networks and the selection of the corresponding segmentation network during inference time. We evaluate our approach on the large-scale Inria Aerial Image Labeling Dataset which contains aerial images of globally distributed cities. Our approach achieves an outperformance against state-of-the-art approaches on the Intersection over Union metric for the building class over all cities and by more than 2% for specific cities.

Benjamin Bischke | Jörn Hees | Andreas Dengel | Patrick Helber

[1] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Michael Kampffmeyer,et al. Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Andreas Dengel,et al. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks , 2017, 2019 IEEE International Conference on Image Processing (ICIP).

[5] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[6] Krystian Mikolajczyk,et al. Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[7] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[8] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9] Pierre Alliez,et al. Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[10] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[12] S. Dwivedi,et al. Obesity May Be Bad: Compressed Convolutional Networks for Biomedical Image Segmentation , 2020 .

[13] Andreas Dengel,et al. Towards a Sentinel-2 Based Human Settlement Layer , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15] Alexey Shvets,et al. TernausNetV2: Fully Convolutional Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).