Learning Building Extraction in Aerial Scenes with Convolutional Networks

Extracting buildings from aerial scene images is an important task with many applications. However, this task is highly difficult to automate due to extremely large variations of building appearances, and still heavily relies on manual work. To attack this problem, we design a deep convolutional network with a simple structure that integrates activation from multiple layers for pixel-wise prediction, and introduce the signed distance function of building boundaries to represent output, which has an enhanced representation power. To train the network, we leverage abundant building footprint data from geographic information systems (GIS) to generate large amounts of labeled data. The trained model achieves a superior performance on datasets that are significantly larger and more complex than those used in prior work, demonstrating that the proposed method provides a promising and scalable solution for automating this labor-intensive task.

[1]  C. Fraser,et al.  Automatic Detection of Residential Buildings Using LIDAR Data and Multispectral Imagery , 2010 .

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Taejung Kim,et al.  Development of a graph-based approach for building detection , 1999, Image Vis. Comput..

[6]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[7]  Qian-Yi Zhou,et al.  Fast and extensible building modeling from airborne LiDAR data , 2008, GIS '08.

[8]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jordi Inglada,et al.  Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features , 2007 .

[11]  Josiane Zerubia,et al.  Building Development Monitoring in Multitemporal Remotely Sensed Image Pairs with Stochastic Birth-Death Dynamics , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[13]  Nathan Silberman,et al.  Instance Segmentation of Indoor Scenes Using a Coverage Loss , 2014, ECCV.

[14]  Geoffrey E. Hinton,et al.  Machine Learning for Aerial Image Labeling , 2013 .

[15]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[16]  Jiangye Yuan,et al.  Learning to count buildings in diverse aerial scenes , 2014, SIGSPATIAL/GIS.

[17]  Xiaopeng Zhang,et al.  Robust Rooftop Extraction From Visible Band Images Using Higher Order CRF , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Jake Porway,et al.  A hierarchical and contextual model for aerial image understanding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.