APDC-Net: Attention Pooling-Based Convolutional Network for Aerial Scene Classification

Deep learning methods have boosted the performance of a series of visual tasks. However, the aerial image scene classification remains challenging. The object distribution and spatial arrangement in aerial scenes are often more complicated than in natural image scenes. Possible solutions include highlighting local semantics relevant to the scene label and preserving more discriminative features. To tackle this challenge, in this letter, we propose an attention pooling-based dense connected convolutional network (APDC-Net) for aerial scene classification. First, it uses a simplified dense connection structure as the backbone to preserve features from different levels. Then, we propose a trainable pooling to down-sample the feature maps and to enhance the local semantic representation capability. Finally, we introduce a multi-level supervision strategy, so that features from different levels are all allowed to supervise the training process directly. Exhaustive experiments on three aerial scene classification benchmarks demonstrate that our proposed APDC-Net outperforms other state-of-the-art methods with much fewer parameters and validate the effectiveness of our attention-based pooling and multi-level supervision strategy.

[1]  Yun-Nung Chen,et al.  RAP-Net: Recurrent Attention Pooling Networks for Dialogue Response Selection , 2020, Comput. Speech Lang..

[2]  Shawn D. Newsam,et al.  Geographic Image Retrieval Using Local Invariant Features , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Kai Xu,et al.  A Multi-Scale Filtering Building Index for Building Extraction in Very High-Resolution Satellite Imagery , 2019, Remote. Sens..

[5]  Zhuowen Tu,et al.  Generalizing Pooling Functions in CNNs: Mixed, Gated, and Tree , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[7]  Cong Lin,et al.  Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Jiwen Lu,et al.  Learning a Discriminative Distance Metric With Label Consistency for Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Lihi Zelnik-Manor,et al.  OTC: A Novel Local Descriptor for Scene Classification , 2014, ECCV.

[10]  Liangpei Zhang,et al.  Pre-Trained AlexNet Architecture with Pyramid Pooling and Supervision for High Spatial Resolution Remote Sensing Image Scene Classification , 2017, Remote. Sens..

[11]  Bo Huang,et al.  Transfer Learning With Fully Pretrained Deep Convolution Networks for Land-Use Classification , 2017, IEEE Geoscience and Remote Sensing Letters.

[12]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[13]  Yansheng Li,et al.  Unsupervised Spectral–Spatial Feature Learning With Stacked Sparse Autoencoder for Hyperspectral Imagery Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[14]  Zhenwei Shi,et al.  Random Access Memories: A New Paradigm for Target Detection in High Resolution Aerial Remote Sensing Images , 2018, IEEE Transactions on Image Processing.

[15]  Liangpei Zhang,et al.  Adaptive Deep Sparse Semantic Modeling Framework for High Spatial Resolution Image Scene Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Yong Zhang,et al.  Attention pooling-based convolutional neural network for sentence modelling , 2016, Inf. Sci..

[17]  Curt H. Davis,et al.  Enhanced Fusion of Deep Neural Networks for Classification of Benchmark High-Resolution Image Data Sets , 2018, IEEE Geoscience and Remote Sensing Letters.

[18]  Ling Shao,et al.  Generalized Pooling for Robust Object Tracking , 2016, IEEE Transactions on Image Processing.

[19]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Fahad Shahbaz Khan,et al.  Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification , 2017, ArXiv.

[21]  Gui-Song Xia,et al.  Bag-of-Visual-Words Scene Classifier With Local and Global Features for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Geoscience and Remote Sensing Letters.

[22]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Shuicheng Yan,et al.  STAP: Spatial-Temporal Attention-Aware Pooling for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.