TMNet: A Two-Branch Multi-Scale Semantic Segmentation Network for Remote Sensing Images

Pixel-level information of remote sensing images is of great value in many fields. CNN has a strong ability to extract image backbone features, but due to the localization of convolution operation, it is challenging to directly obtain global feature information and contextual semantic interaction, which makes it difficult for a pure CNN model to obtain higher precision results in semantic segmentation of remote sensing images. Inspired by the Swin Transformer with global feature coding capability, we design a two-branch multi-scale semantic segmentation network (TMNet) for remote sensing images. The network adopts the structure of a double encoder and a decoder. The Swin Transformer is used to increase the ability to extract global feature information. A multi-scale feature fusion module (MFM) is designed to merge shallow spatial features from images of different scales into deep features. In addition, the feature enhancement module (FEM) and channel enhancement module (CEM) are proposed and added to the dual encoder to enhance the feature extraction. Experiments were conducted on the WHDLD and Potsdam datasets to verify the excellent performance of TMNet.

[1]  Shuyuan Yang,et al.  Orientation Attention Network for semantic segmentation of remote sensing images , 2023, Knowl. Based Syst..

[2]  Shibiao Xu,et al.  RSSFormer: Foreground Saliency Enhancement for Remote Sensing Land-Cover Segmentation , 2023, IEEE Transactions on Image Processing.

[3]  Daan de Geus,et al.  Intra-Batch Supervision for Panoptic Segmentation on High-Resolution Images , 2023, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[4]  Bob Zhang,et al.  Deep Bilateral Filtering Network for Point-Supervised Semantic Segmentation in Remote Sensing Images , 2022, IEEE Transactions on Image Processing.

[5]  Jingdong Wang,et al.  Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xinchang Zhang,et al.  Deep Learning Classification by ResNet-18 Based on the Real Spectral Dataset from Multispectral Remote Sensing Images , 2022, Remote. Sens..

[7]  Qihao Weng,et al.  Operational earthquake-induced building damage assessment using CNN-based direct remote sensing change detection on superpixel level , 2022, Int. J. Appl. Earth Obs. Geoinformation.

[8]  Peijuan Wang,et al.  A comprehensive review on deep learning based remote sensing image super-resolution methods , 2022, Earth-Science Reviews.

[9]  Pierre-Alexis Herrault,et al.  The promising combination of a remote sensing approach and landscape connectivity modelling at a fine scale in urban planning , 2022, Ecological Indicators.

[10]  Fukun Bi,et al.  A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images , 2022, Symmetry.

[11]  L. Gool,et al.  DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  David Z. Pan,et al.  Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wei Liu,et al.  CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention , 2021, ICLR.

[14]  Jure Leskovec,et al.  Combiner: Full Attention Transformer with Sparse Computation Cost , 2021, NeurIPS.

[15]  Nenghai Yu,et al.  CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Stephen Lin,et al.  Video Swin Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Guangming Lu,et al.  DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation , 2021, IEEE Transactions on Instrumentation and Measurement.

[18]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[19]  Cordelia Schmid,et al.  Segmenter: Transformer for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Wei Lu,et al.  Remote sensing image processing technology based on mobile augmented reality technology in surveying and mapping engineering , 2021, Soft Computing.

[21]  Xi Chen,et al.  Adaptive Effective Receptive Field Convolution for Semantic Segmentation of VHR Remote Sensing Images , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Jonathon Shlens,et al.  Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiang Li,et al.  Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[25]  Alexandros Stergiou,et al.  Refining activation downsampling with SoftPool , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[28]  L. Bruzzone,et al.  Improving Semantic Segmentation of Aerial Images Using Patch-based Attention , 2019, IEEE Trans. Geosci. Remote. Sens..

[29]  Yu Li,et al.  Multi-sensor cloud and cloud shadow segmentation with a convolutional neural network , 2019, Remote Sensing of Environment.

[30]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Rongsheng Dong,et al.  DenseU-Net-Based Semantic Segmentation of Small Objects in Urban Remote Sensing Images , 2019, IEEE Access.

[32]  Jian Sun,et al.  DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Chunhua Shen,et al.  Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yuhao Liu,et al.  Deep learning based cloud detection for remote sensing images by the fusion of multi-scale convolutional features , 2018, ArXiv.

[35]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[36]  Ke Yang,et al.  Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset , 2018, Remote. Sens..

[37]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[39]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[40]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[44]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Zhi Guo,et al.  BSNet: Dynamic Hybrid Gradient Convolution Based Boundary-Sensitive Network for Remote Sensing Image Segmentation , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Qimin Cheng,et al.  Multilabel Remote Sensing Image Retrieval Based on Fully Convolutional Network , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.