SE2Net: semantic segmentation of remote sensing images based on self-attention and edge enhancement modules

Abstract. The semantic segmentation of optical satellite remote sensing images is more challenging than that of natural images, owing to the considerable differences in the texture, shape, topology, and scale of ground features in different areas and the coexistence of dense and sparse arrangements. To alleviate these difficulties and ensure a high accuracy, a semantic segmentation framework named the self-attention and edge enhancement network (SE2Net) is constructed considering two aspects. First, because the self-attention mechanism can capture more useful semantic information by modeling large neighborhood correlations, we embed a self-attention module known as spatial expectation maximization attention (SEMA) in the considered network. Second, the Laplace operator is adopted to explore the significant edge information to design an edge enhancement module (EEM). Finally, both SEMA and EEM are embedded in the proposed SE2Net, thereby forming an end-to-end network. To validate the performance of the proposed approach, we construct a semantic segmentation dataset (SSD) using Tian-Hui 1 satellite images and conduct extensive experiments on both the SSD and the gaofen image dataset (GID). The results demonstrate the superiority of the proposed method over other state-of-the-art approaches and the effectiveness of the constructed SSD.

[1]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[3]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[4]  Wang Jianrong,et al.  Photogrammetry of Mapping Satellite-1 without Ground Control Points , 2013 .

[5]  Gui-Song Xia,et al.  Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models , 2018 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[9]  Xinchang Zhang,et al.  Developing a multi-filter convolutional neural network for semantic segmentation using high-resolution aerial imagery and LiDAR data , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[10]  Kun Zhu,et al.  Symmetrical Dense-Shortcut Deep Fully Convolutional Networks for Semantic Segmentation of Very-High-Resolution Remote Sensing Images , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[11]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hong Liu,et al.  Expectation-Maximization Attention Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Tao Jiang,et al.  Semantic segmentation of very high-resolution remote sensing image based on multiple band combinations and patchwise scene analysis , 2020 .

[14]  Jing Wang,et al.  Segmentation model based on convolutional neural networks for extracting vegetation from Gaofen-2 images , 2018, Journal of Applied Remote Sensing.

[15]  Yi Zhang,et al.  PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[16]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lorenzo Bruzzone,et al.  A Deep Architecture Based on a Two-Stage Learning for Semantic Segmentation of Large-Size Remote Sensing Images , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[19]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[20]  Jianrong Wang,et al.  The on-orbit calibration of geometric parameters of the Tian-Hui 1 (TH-1) satellite , 2017 .

[21]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[23]  Dilated Residual Network Based on Dual Expectation Maximization Attention for Semantic Segmentation of Remote Sensing Images , 2020, IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium.

[24]  Yongheng Shang,et al.  Coastal Oyster Aquaculture Area Extraction and Nutrient Loading Estimation Using a GF-2 Satellite Image , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[25]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Feng Wang,et al.  Semantic Segmentation of High Resolution Remote Sensing Image Based on Batch-Attention Mechanism , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[27]  Wei Sun,et al.  Methods and datasets on semantic segmentation: A review , 2018, Neurocomputing.

[28]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Tania Llasera,et al.  Ser , 2019, Springer Reference Medizin.