AANet: an attention-based alignment semantic segmentation network for high spatial resolution remote sensing images

ABSTRACT In this paper, we present an efficient network to tackle three critical problems in high spatial resolution (HSR) remote sensing image segmentation: feature misalignment, insufficient contextual information extraction and various class imbalance issues. In detail, we propose a novel Feature Alignment Block (FAB) to suppress misalignment issues with the guide of an anchor map. Further, to extract sufficient information, we design a Contextual Augmentation Block (CAB) to augment features of different semantic levels. Finally, we present an Annealing Online Hard Example Mining (AOHEM) strategy to handle the various class imbalance issues with a view to dynamically adjust the focus of the network. We apply the above proposed designs to FPN to form our Attention-based Alignment Network (AANet). Experimental results demonstrate that the proposed method achieves promising results on the challenging iSAID and Vaihingen datasets with a better trade-off between accuracy and complexity.

[1]  Zhuo Zheng,et al.  FactSeg: Foreground Activation-Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Thomas S. Huang,et al.  AlignSeg: Feature-Aligned Segmentation Networks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jocelyn Chanussot,et al.  An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery , 2021, ISPRS Journal of Photogrammetry and Remote Sensing.

[4]  Zhouchen Lin,et al.  PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rui Li,et al.  ABCNet: Attentive Bilateral Contextual Network for Efficient Semantic Segmentation of Fine-Resolution Remote Sensing Images , 2021, ISPRS Journal of Photogrammetry and Remote Sensing.

[6]  P. Atkinson,et al.  ABCNet: Attentive bilateral contextual network for efficient semantic segmentation of Fine-Resolution remotely sensed imagery ISPRS Journal of Photogrammetry and Remote Sensing , 2021 .

[7]  Zhuo Zheng,et al.  Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Lorenzo Bruzzone,et al.  Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images , 2020, Remote. Sens..

[9]  Lorenzo Bruzzone,et al.  Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ling Shao,et al.  iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images , 2019, CVPR Workshops.

[12]  Xiao Xiang Zhu,et al.  A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[15]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[16]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[24]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.