Scene Classification of Remote Sensing Images Based on Saliency Dual Attention Residual Network

Scene classification of high-resolution Remote Sensing Images (RSI) is one of basic challenges in RSI interpretation. Existing scene classification methods based on deep learning have achieved impressive performances. However, since RSI commonly contain various types of ground objects and complex backgrounds, most of methods cannot focus on saliency features of scene, which limits the classification performances. To address this issue, we propose a novel Saliency Dual Attention Residual Network (SDAResNet) to extract both cross-channel and spatial saliency information for scene classification of RSI. More specifically, the proposed SDAResNet consists of spatial attention and channel attention, in which spatial attention is embedded in low-level feature to emphasize saliency location information and suppress background information, and channel attention is integrated to high-level features to extract saliency meaningful information. Additionally, several image classification tricks are used to further improve classification accuracy. Finally, Extensive experiments on two challenging benchmark RSI datasets are presented to demonstrate that our methods outperform most of state-of-the-art approaches significantly.

[1]  Xuelong Li,et al.  Scene Classification With Recurrent Attention of VHR Remote Sensing Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Ning Li,et al.  Multiscale deep features learning for land-use scene recognition , 2018 .

[3]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[4]  Philip H. S. Torr,et al.  Learn To Pay Attention , 2018, ICLR.

[5]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[6]  Wei Xiong,et al.  A Discriminative Feature Learning Approach for Remote Sensing Image Retrieval , 2019, Remote. Sens..

[7]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[8]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[9]  Yuanzhou Yang,et al.  Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[10]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[11]  Yang Wang,et al.  Deep Discriminative Representation Learning with Attention Map for Scene Classification , 2019, Remote. Sens..

[12]  Yunlong Yu,et al.  Aerial Scene Classification via Multilevel Fusion Based on Deep Convolutional Neural Networks , 2018, IEEE Geoscience and Remote Sensing Letters.

[13]  Liangpei Zhang,et al.  A Deep-Local-Global Feature Fusion Framework for High Spatial Resolution Imagery Scene Classification , 2018, Remote. Sens..

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xuelong Li,et al.  Attention Based Network for Remote Sensing Scene Classification , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Hefei Ling,et al.  Self Residual Attention Network for Deep Face Recognition , 2019, IEEE Access.

[18]  Jian Xu,et al.  A Novel Feature Fusion with Self-adaptive Weight Method Based on Deep Learning for Image Classification , 2018, PCM.

[19]  Cong Lin,et al.  Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[21]  Mathieu Salzmann,et al.  Deep Attentional Structured Representation Learning for Visual Recognition , 2018, BMVC.

[22]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[23]  Hong Huo,et al.  Global-Local Attention Network for Aerial Scene Classification , 2019, IEEE Access.

[24]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ting Zhao,et al.  Pyramid Feature Attention Network for Saliency Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[28]  Shutao Li,et al.  Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Antonio Plaza,et al.  Scale-Free Convolutional Neural Network for Remote Sensing Scene Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Jinxing Hu,et al.  Managing Big City Information Based on WebVRGIS , 2016, IEEE Access.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  In-So Kweon,et al.  BAM: Bottleneck Attention Module , 2018, BMVC.

[35]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[37]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[38]  Yi Zhang,et al.  PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[39]  Yunlong Yu,et al.  Dense Connectivity Based Two-Stream Deep Feature Fusion Framework for Aerial Scene Classification , 2018, Remote. Sens..

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[43]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[44]  Zhenfeng Shao,et al.  PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[45]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.