Real-Time Dense Semantic Labeling with Dual-Path Framework for High-Resolution Remote Sensing Image

Dense semantic labeling plays a pivotal role in high-resolution remote sensing image research. It provides pixel-level classification which is crucial in land cover mapping and urban planning. With the recent success of the convolutional neural network (CNN), accuracy has been greatly improved by previous works. However, most networks boost performance by involving too many parameters and computational overheads, which results in more inference time and hardware resources, while some attempts with light-weight networks do not achieve satisfactory results due to the insufficient feature extraction ability. In this work, we propose an efficient light-weight CNN based on dual-path architecture to address this issue. Our model utilizes three convolution layers as the spatial path to enhance the extraction of spatial information. Meanwhile, we develop the context path with the multi-fiber network (MFNet) followed by the pyramid pooling module (PPM) to obtain a sufficient receptive field. On top of these two paths, we adopt the channel attention block to refine the features from the context path and apply a feature fusion module to combine spatial information with context information. Moreover, a weighted cascade loss function is employed to enhance the learning procedure. With all these components, the performance can be significantly improved. Experiments on the Potsdam and Vaihingen datasets demonstrate that our network performs better than other light-weight networks, even some classic networks. Compared to the state-of-the-art U-Net, our model achieves higher accuracy on the two datasets with 2.5 times less network parameters and 22 times less computational floating point operations (FLOPs).

[1]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Haslina Arshad,et al.  Geometric feature descriptor and dissimilarity-based registration of remotely sensed imagery , 2018, PloS one.

[5]  Shuicheng Yan,et al.  Multi-Fiber Networks for Video Recognition , 2018, ECCV.

[6]  Lianru Gao,et al.  High-Resolution Aerial Imagery Semantic Labeling with Dense Pyramid Network , 2018, Sensors.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[9]  Anton van den Hengel,et al.  Real-time Semantic Image Segmentation via Spatial Sparsity , 2017, ArXiv.

[10]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[12]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[14]  Jiangyun Li,et al.  Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images , 2018, Sensors.

[15]  Gang Fu,et al.  Classification for High Resolution Remote Sensing Imagery Using a Fully Convolutional Network , 2017, Remote. Sens..

[16]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Yongyang Xu,et al.  Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters , 2018, Remote. Sens..

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ying Wang,et al.  Gated Convolutional Neural Network for Semantic Segmentation in High-Resolution Images , 2017, Remote. Sens..

[21]  Shiming Xiang,et al.  Learnable Gated Convolutional Neural Network for Semantic Segmentation in Remote-Sensing Images , 2019, Remote. Sens..

[22]  Li Shen,et al.  Deep Feature Fusion with Integration of Residual Connection and Attention Model for Classification of VHR Remote Sensing Images , 2019, Remote. Sens..

[23]  Yuhao Wang,et al.  Dense Semantic Labeling with Atrous Spatial Pyramid Pooling and Decoder for High-Resolution Remote Sensing Imagery , 2018, Remote. Sens..

[24]  Wei Yuan,et al.  Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks , 2018, Remote. Sens..

[25]  Konstantinos Karantzalos,et al.  A Novel Object-Based Deep Learning Framework for Semantic Segmentation of Very High-Resolution Remote Sensing Data: Comparison with Convolutional and Fully Convolutional Networks , 2019, Remote. Sens..

[26]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[27]  Uwe Stilla,et al.  SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS , 2016 .

[28]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[29]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[30]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[32]  Jon Atli Benediktsson,et al.  Land-Cover Mapping by Markov Modeling of Spatial–Contextual Information in Very-High-Resolution Remote Sensing Images , 2013, Proceedings of the IEEE.

[33]  Yufeng Wang,et al.  ERN: Edge Loss Reinforced Semantic Segmentation Network for Remote Sensing Images , 2018, Remote. Sens..

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Markus Gerke,et al.  Use of the stair vision library within the ISPRS 2D semantic labeling benchmark (Vaihingen) , 2014 .

[37]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Xin Pan,et al.  High-Resolution Remote Sensing Image Classification Method Based on Convolutional Neural Network and Restricted Conditional Random Field , 2018, Remote. Sens..

[39]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Menglong Yan,et al.  Semantic pixel labelling in remote sensing images using a deep convolutional encoder-decoder model , 2018 .

[42]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Bing Zhang,et al.  A Review of Remote Sensing Image Classification Techniques: the Role of Spatio-contextual Information , 2014 .

[44]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[46]  Sildomar T. Monteiro,et al.  Dense Semantic Labeling of Very-High-Resolution Aerial Imagery and LiDAR with Fully-Convolutional Neural Networks and Higher-Order CRFs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Wensheng Cheng,et al.  Context Aggregation Network for Semantic Labeling in Aerial Images , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[48]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[49]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[50]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .