Global-Local Attention Network for Aerial Scene Classification

The classification performance of aerial scenes relies heavily on the discriminative power of feature representation from high-spatial resolution remotely sensed imagery. The convolutional neural networks (CNNs) have recently been applied to adaptively learn image features at different levels of abstraction rather than requiring handcrafted features and achieved state-of-the-art performance. However, most of these networks focus on multi-stage global feature learning yet neglect the local information, which plays an important role in scene recognition. To address this issue, a novel end-to-end global-local attention network (GLANet) is proposed to capture both global and local information for aerial scene classification. FC layers in the VGGNet are replaced by the global attention (GA) branch and local attention (LA) branch, one of which learns the global information while the other learns the local semantic information via attention mechanisms. During each training, the labels of input images can be predicted by the local, global, and their concatenated features using softmax. According to different predicted labels, two auxiliary loss functions are further computed and imposed on the proposed network to enhance the supervision for network learning. The experimental results on three challenging large-scale scene datasets demonstrate the effectiveness of the proposed global-local attention network.

[1]  Bin Wang,et al.  A Novel Spatial–Spectral Similarity Measure for Dimensionality Reduction and Classification of Hyperspectral Imagery , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Ping Tang,et al.  A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification , 2014 .

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Zhiguo Jiang,et al.  Sparsity-constrained probabilistic latent semantic analysis for land cover classification , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[6]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[7]  Anil M. Cheriyadat,et al.  Bag of Lines (BoL) for Improved Aerial Scene Representation , 2015, IEEE Geoscience and Remote Sensing Letters.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Davide Modolo,et al.  Do Semantic Parts Emerge in Convolutional Neural Networks? , 2016, International Journal of Computer Vision.

[10]  Gui-Song Xia,et al.  Learning High-level Features for Satellite Image Classification With Limited Labeled Samples , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Fahad Shahbaz Khan,et al.  Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification , 2017, ArXiv.

[13]  Liangpei Zhang,et al.  A Deep-Local-Global Feature Fusion Framework for High Spatial Resolution Imagery Scene Classification , 2018, Remote. Sens..

[14]  Cong Lin,et al.  Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Yunlong Yu,et al.  Dense Connectivity Based Two-Stream Deep Feature Fusion Framework for Aerial Scene Classification , 2018, Remote. Sens..

[16]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[17]  Bo Du,et al.  Scene Classification via a Gradient Boosting Random Convolutional Network Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Wen Yang,et al.  High-resolution satellite scene classification using a sparse coding based multiple feature combination , 2012 .

[19]  Yishu Liu,et al.  Scene Classification Based on Two-Stage Deep Feature Fusion , 2018, IEEE Geoscience and Remote Sensing Letters.

[20]  Zhihan Lv,et al.  Virtual Reality Smart City Based on WebVRGIS , 2016, IEEE Internet of Things Journal.

[21]  Yingli Tian,et al.  Pyramid of Spatial Relatons for Scene-Level Land Use Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Shawn D. Newsam,et al.  Comparing SIFT descriptors and gabor texture features for classification of remote sensed imagery , 2008, 2008 15th IEEE International Conference on Image Processing.

[23]  Medeni Soysal,et al.  Performance Analysis of State-of-the-Art Representation Methods for Geographical Image Retrieval and Categorization , 2014, IEEE Geoscience and Remote Sensing Letters.

[24]  Yunlong Yu,et al.  Aerial Scene Classification via Multilevel Fusion Based on Deep Convolutional Neural Networks , 2018, IEEE Geoscience and Remote Sensing Letters.

[25]  Yunlong Yu,et al.  A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification , 2018, Comput. Intell. Neurosci..

[26]  Mihai Datcu,et al.  Latent Dirichlet Allocation for Spatial Analysis of Satellite Images , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[27]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[28]  Hong Huo,et al.  Exploiting Convolutional Neural Networks With Deeply Local Description for Remote Sensing Image Classification , 2018, IEEE Access.

[29]  Curt H. Davis,et al.  Enhanced Fusion of Deep Neural Networks for Classification of Benchmark High-Resolution Image Data Sets , 2018, IEEE Geoscience and Remote Sensing Letters.

[30]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Jianfei Cai,et al.  Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation , 2016, IEEE Transactions on Image Processing.

[32]  Ning Li,et al.  Multiscale deep features learning for land-use scene recognition , 2018 .

[33]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Bin Wang,et al.  Fusion of Hyperspectral and Multispectral Images: A Novel Framework Based on Generalization of Pan-Sharpening Methods , 2014, IEEE Geoscience and Remote Sensing Letters.

[36]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[38]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[39]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Qian Du,et al.  Fusing Local and Global Features for High-Resolution Scene Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[41]  Jinxing Hu,et al.  Managing Big City Information Based on WebVRGIS , 2016, IEEE Access.

[42]  Hongxun Yao,et al.  Deep Feature Fusion for VHR Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Bin Luo,et al.  Indexing of Remote Sensing Images With Different Resolutions by Multiple Features , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[44]  Piotr Tokarczyk,et al.  Features, Color Spaces, and Boosting: New Insights on Semantic Classification of Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Aamir Saeed Malik,et al.  Scene classification for aerial images based on CNN using sparse coding technique , 2017 .

[46]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Zhenfeng Shao,et al.  PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[48]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[49]  Shiming Xiang,et al.  Vehicle Detection in Satellite Images by Hybrid Deep Convolutional Neural Networks , 2014, IEEE Geoscience and Remote Sensing Letters.

[50]  Hong Sun,et al.  Unsupervised Feature Learning Via Spectral Clustering of Multidimensional Patches for Remotely Sensed Scene Classification , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.