Semantic segmentation of high spatial resolution images with deep neural networks

Availability of reliable delineation of urban lands is fundamental to applications such as infrastructure management and urban planning. An accurate semantic segmentation approach can assign each pixel of remotely sensed imagery a reliable ground object class. In this paper, we propose an end-to-end deep learning architecture to perform the pixel-level understanding of high spatial resolution remote sensing images. Both local and global contextual information are considered. The local contexts are learned by the deep residual net, and the multi-scale global contexts are extracted by a pyramid pooling module. These contextual features are concatenated to predict labels for each pixel. In addition, multiple additional losses are proposed to enhance our deep learning network to optimize multi-level features from different resolution images simultaneously. Two public datasets, including Vaihingen and Potsdam datasets, are used to assess the performance of the proposed deep neural network. Comparison with the results from the published state-of-the-art algorithms demonstrates the effectiveness of our approach.

[1]  Thomas Blaschke,et al.  Object based image analysis for remote sensing , 2010 .

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiuwen Liu,et al.  A patch-based convolutional neural network for remote sensing image classification , 2017, Neural Networks.

[4]  Bo Huang,et al.  Transfer Learning With Fully Pretrained Deep Convolution Networks for Land-Use Classification , 2017, IEEE Geoscience and Remote Sensing Letters.

[5]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[6]  V. Neuhaus,et al.  The role of arthrodesis of the wrist in spastic disorders , 2015, The Journal of hand surgery, European volume.

[7]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Jin Zhao,et al.  Superpixel-Based Multiple Local CNN for Panchromatic and Multispectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Min Zhang,et al.  Scale parameter selection by spatial statistics for GeOBIA: Using mean-shift based multi-scale segmentation as an example , 2015 .

[10]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Tao Liu,et al.  Comparing fully convolutional networks, random forest, support vector machine, and patch-based deep convolutional neural networks for object-based wetland mapping using images from small unmanned aircraft system , 2018 .

[12]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  William J. Emery,et al.  Contextually guided very-high-resolution imagery classification with semantic segments , 2017 .

[15]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[16]  Curt H. Davis,et al.  Fusion of Deep Convolutional Neural Networks for Land Cover Classification of High-Resolution Imagery , 2017, IEEE Geoscience and Remote Sensing Letters.

[17]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Faith R. Kearns,et al.  Classification of the wildland-urban interface: A comparison of pixel- and object-based classifications using high-resolution aerial photography , 2008, Comput. Environ. Urban Syst..

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Geoffrey J. Hay,et al.  Geographic object-based image analysis (GEOBIA): emerging trends and future opportunities , 2018 .

[21]  W. G. Cochran The comparison of percentages in matched samples. , 1950, Biometrika.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[24]  Ronald Kemker,et al.  Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[25]  Seol-Heui Han,et al.  High sucrose consumption during pregnancy induced ADHD-like behavioral phenotypes in mice offspring. , 2015, The Journal of nutritional biochemistry.

[26]  Jamie Sherrah,et al.  Semantic Labeling of Aerial and Satellite Imagery , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[27]  William J. Emery,et al.  Object-Based Convolutional Neural Network for High-Resolution Imagery Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[28]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jonathan Cheung-Wai Chan,et al.  Learning and Transferring Deep Joint Spectral–Spatial Features for Hyperspectral Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Xin Pan,et al.  A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[31]  Luca Benini,et al.  Deep structured features for semantic segmentation , 2016, 2017 25th European Signal Processing Conference (EUSIPCO).

[32]  Shuyuan Yang,et al.  Deep Fully Convolutional Network-Based Spatial Distribution Prediction for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[34]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  T. Esch,et al.  Object-based feature extraction using high spatial resolution satellite data of urban areas , 2010 .

[36]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[37]  Shihong Du,et al.  Learning multiscale and deep representations for classifying remotely sensed imagery , 2016 .

[38]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[39]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[42]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[43]  Jamie Sherrah,et al.  Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery , 2016, ArXiv.

[44]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Markus Gerke,et al.  Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen) , 2014 .