MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

Semantic segmentation for urban remote sensing images is one of the most-crucial tasks in the field of remote sensing. Remote sensing images contain rich information on ground objects, such as shape, location, and boundary and can be found in high-resolution remote sensing images. It is exceedingly challenging to identify remote sensing images because of the large intraclass variance and low interclass variance caused by these objects. In this article, we propose a multiscale hierarchical channel attention fusion network model based on a transformer and CNN, which we name the multiscale channel attention fusion network (MCAFNet). MCAFNet uses ResNet-50 and Vit-B/16 to learn the global–local context, and this strengthens the semantic feature representation. Specifically, a global–local transformer block (GLTB) is deployed in the encoder stage. This design handles image details at low resolution and extracts global image features better than previous methods. In the decoder module, a channel attention optimization module and a fusion module are added to better integrate high- and low-dimensional feature maps, which enhances the network’s ability to obtain small-scale semantic information. The proposed method is conducted on the ISPRS Vaihingen and Potsdam datasets. Both quantitative and qualitative evaluations show the competitive performance of MCAFNet in comparison to the performance of the mainstream methods. In addition, we performed extensive ablation experiments on the Vaihingen dataset in order to test the effectiveness of multiple network components.

[1]  Lizhuang Ma,et al.  End-to-End Video Object Detection with Spatial-Temporal Transformers , 2021, ACM Multimedia.

[2]  Jiayi Ma,et al.  Cross Fusion Net: A Fast Semantic Segmentation Network for Small-Scale Semantic Information Capturing in Aerial Scenes , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[4]  Xiaojie Guo,et al.  Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation , 2021, Neural Networks.

[5]  Lorenzo Bruzzone,et al.  Improving Semantic Segmentation of Aerial Images Using Patch-based Attention , 2019, ArXiv.

[6]  Prasanna K. Sahoo,et al.  Color image segmentation based on multi-level Tsallis-Havrda-Charvát entropy and 2D histogram using PSO algorithms , 2019, Pattern Recognit..

[7]  Yassine Ruichek,et al.  Survey on semantic segmentation using deep learning techniques , 2019, Neurocomputing.

[8]  Biao Wang,et al.  Building Extraction in Very High Resolution Imagery by Dense-Attention Networks , 2018, Remote. Sens..

[9]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[12]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.