A Comparison of Deep Learning Architectures for Semantic Mapping of Very High Resolution Images

Semantic mapping of land cover is a key, but challenging, problem in remote sensing. Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have shown outstanding performance in this task. In order to develop refined deep learning pipeline for meeting the rising need for accurate semantic mapping in remote sensing images, this paper study and compare a number of advanced deep learning segmentation architectures, which have obtained state-of-the-art results on computer vision contests like the Pascal VOC. To further analyze and compare the effectiveness of some elaborate layers and underlying structures introduced by these architectures, we evaluate them by re-implementing, train and test them on ISPRS Potsdam dataset. Our results show that a promising performance with overall Fl_score above 87% and mIoU of 79% can be obtained by only using the RGB images, without any post-processing such as conditional random field (CRF) smoothing. At last, we propose several possible approaches to further enhance the deep learning architectures to better deal with high-resolution aerial images. We therefore consider this work to be helpful for the remote sensing research community.

[1]  Alexandre Boulch,et al.  Deep Learning for Urban Remote Sensing ( Invited Paper ) , 2016 .

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[4]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[5]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Sildomar T. Monteiro,et al.  Dense Semantic Labeling of Very-High-Resolution Aerial Imagery and LiDAR with Fully-Convolutional Neural Networks and Higher-Order CRFs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[13]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[15]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[16]  Bertrand Le Saux,et al.  Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks , 2016, ACCV.

[17]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).