Semantic segmentation for remote sensing images based on an AD-HRNet model

ABSTRACT Semantic segmentation for remote sensing images faces challenges of unbalanced category weight, rich context causing difficulties of recognition, blurred boundaries of multi-scale objects, and so on. To address these problems, we propose a new model by combining HRNet with attention mechanisms and dilated convolution, denoted as: AD-HRNet for the semantic segmentation of remote sensing images. In the framework of AD-HRNet, we obtained the weight value of each category based on an improved weighted cross-entropy function by introducing the median frequency balance method to solve the issue of class weight imbalance. The Shuffle-CBAM module with channel attention and spatial attention in AD-HRNet framework was applied to extract more global context information of images through slightly increasing the amount of computation. To address the problem of blurred boundaries caused by multi-scale object segmentation and edge segmentation, we developed an MDC-DUC module in AD-HRNet framework to capture the context information of multi-scale objects and the edge information of many irregular objects. Taking Postdam, Vaihingen, and SAMA-VTOL datasets as materials, we verified the performance of AD-HRNet by comparing with eight typical semantic segmentation models. Experimental results shown that AD-HRNet increases the mIoUs to 75.59% and 71.58% based on the Postdam and Vaihingen datasets, respectively.

[1]  Ralph R. Martin,et al.  Attention mechanisms in computer vision: A survey , 2021, Computational Visual Media.

[2]  P. Atkinson,et al.  UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery , 2021, ISPRS Journal of Photogrammetry and Remote Sensing.

[3]  Huchuan Lu,et al.  Center-Boundary Dual Attention for Oriented Object Detection in Remote Sensing Images , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Ce Zhang,et al.  Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery , 2022, IEEE Geoscience and Remote Sensing Letters.

[5]  J. Wu,et al.  RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images , 2022, Remote. Sens..

[6]  Mehdi Khoshboresh-Masouleh,et al.  Multi-task learning from fixed-wing UAV images for 2D/3D city modeling , 2021, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[7]  Jaewan Choi,et al.  Semantic Segmentation of Urban Buildings Using a High-Resolution Network (HRNet) with Channel and Spatial Attention Gates , 2021, Remote. Sens..

[8]  Zhouchen Lin,et al.  PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Qing-Long Zhang,et al.  SA-Net: Shuffle Attention for Deep Convolutional Neural Networks , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Weicun Zhang,et al.  HRCNet: High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images , 2020, Remote. Sens..

[11]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[12]  Bo Liu,et al.  Incorporating DeepLabv3+ and object-based image analysis for semantic segmentation of very high resolution remote sensing images , 2020, Int. J. Digit. Earth.

[13]  Hossein Arefi,et al.  Multiscale building segmentation based on deep learning for remote sensing RGB images from different sensors , 2020, Journal of Applied Remote Sensing.

[14]  Wen Song,et al.  Progress in the Remote Sensing Monitoring of the Ecological Environment in Mining Areas , 2020, International journal of environmental research and public health.

[15]  Lorenzo Bruzzone,et al.  Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images , 2020, Remote. Sens..

[16]  Tiberiu T. Cocias,et al.  A survey of deep learning techniques for autonomous driving , 2019, J. Field Robotics.

[17]  Xilin Chen,et al.  Object-Contextual Representations for Semantic Segmentation , 2019, ECCV.

[18]  Peter Caccetta,et al.  ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[19]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Dong Liu,et al.  High-Resolution Representations for Labeling Pixels and Regions , 2019, ArXiv.

[21]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bolei Zhou,et al.  Measuring human perceptions of a large-scale urban region using machine learning , 2018, Landscape and Urban Planning.

[25]  Lin Lei,et al.  Multi-scale object detection in remote sensing imagery with convolutional neural networks , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[26]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[27]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[28]  In-So Kweon,et al.  BAM: Bottleneck Attention Module , 2018, BMVC.

[29]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Ronald Kemker,et al.  Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[31]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xiao Xiang Zhu,et al.  Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.

[34]  Hannes Taubenböck,et al.  Class imbalance in unsupervised change detection - A diagnostic analysis from urban remote sensing , 2017, Int. J. Appl. Earth Obs. Geoinformation.

[35]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[36]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[37]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ramesh Raskar,et al.  Deep Learning the City: Quantifying Urban Perception at a Global Scale , 2016, ECCV.

[40]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yuan Yao,et al.  Big data in smart cities , 2015, Science China Information Sciences.

[43]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[44]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Jie Shan,et al.  A comprehensive review of earthquake-induced building damage detection with remote sensing techniques , 2013 .

[46]  Lalit Kumar,et al.  Investigating the Use of Remote Sensing and GIS Techniques to Detect Land Use and Land Cover Change: A Review , 2013 .

[47]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  Marc Toussaint,et al.  Multi-class image segmentation using conditional random fields and global classification , 2009, ICML '09.

[49]  Sergey V. Samsonov,et al.  A review of the status of satellite remote sensing and image processing techniques for mapping natural hazards and disasters , 2009 .

[50]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[51]  Donald G. Bailey,et al.  A novel approach to real-time bilinear interpolation , 2004, Proceedings. DELTA 2004. Second IEEE International Workshop on Electronic Design, Test and Applications.

[52]  J. Kerr,et al.  From space to species: ecological applications for remote sensing , 2003 .

[53]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  Azriel Rosenfeld,et al.  Image Segmentation by Pixel Classification in (Gray Level, Edge Value) Space , 1978, IEEE Transactions on Computers.