Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification

Remote sensing scene classification aims to assign automatically each aerial image a specific sematic label. In this letter, we propose a new method, called self-attention-based deep feature fusion (SAFF), to aggregate deep layer features and emphasize the weights of the complex objects of remote sensing scene images for remote sensing scene classification. First, the pretrained convolutional neural network (CNN) model is applied to extract the abstract multilayer feature maps from the original aerial imagery. Then, a nonparametric self-attention layer is proposed for spatial-wise and channel-wise weightings, which enhances the effects of the spatial responses of the representative objects and uses the infrequently occurring features more sufficiently. Thus, it can extract more discriminative features. Finally, the aggregated features are fed into a support vector machine (SVM) for classification. The proposed method is experimented on several data sets, and the results prove the effectiveness and efficiency of the scheme for remote sensing scene classification.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[5]  Jun Zhou,et al.  VHR Object Detection Based on Structural Feature Extraction and Query Expansion , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[7]  Jun Zhou,et al.  Object Classification via Feature Fusion Based Marginalized Kernels , 2015, IEEE Geoscience and Remote Sensing Letters.

[8]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[9]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[13]  Lei Guo,et al.  Remote Sensing Image Scene Classification Using Bag of Convolutional Features , 2017, IEEE Geoscience and Remote Sensing Letters.

[14]  Liangpei Zhang,et al.  Pre-Trained AlexNet Architecture with Pyramid Pooling and Supervision for High Spatial Resolution Remote Sensing Image Scene Classification , 2017, Remote. Sens..

[15]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Hongxun Yao,et al.  Deep Feature Fusion for VHR Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[17]  Qianqing Qin,et al.  Scene Classification Based on Multiscale Convolutional Neural Network , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Jun Zhou,et al.  Multiscale Visual Attention Networks for Object Detection in VHR Remote Sensing Images , 2019, IEEE Geoscience and Remote Sensing Letters.

[19]  Antonio Plaza,et al.  Scale-Free Convolutional Neural Network for Remote Sensing Scene Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Antonio Plaza,et al.  Skip-Connected Covariance Network for Remote Sensing Scene Classification , 2020, IEEE Transactions on Neural Networks and Learning Systems.