A Deep-Local-Global Feature Fusion Framework for High Spatial Resolution Imagery Scene Classification

High spatial resolution (HSR) imagery scene classification has recently attracted increased attention. The bag-of-visual-words (BoVW) model is an effective method for scene classification. However, it can only extract handcrafted features, and it disregards the spatial layout information, whereas deep learning can automatically mine the intrinsic features as well as preserve the spatial location, but it may lose the characteristic information of the HSR images. Although previous methods based on the combination of BoVW and deep learning have achieved comparatively high classification accuracies, they have not explored the combination of handcrafted and deep features, and they just used the BoVW model as a feature coding method to encode the deep features. This means that the intrinsic characteristics of these models were not combined in the previous works. In this paper, to discover more discriminative semantics for HSR imagery, the deep-local-global feature fusion (DLGFF) framework is proposed for HSR imagery scene classification. Differing from the conventional scene classification methods, which utilize only handcrafted features or deep features, DLGFF establishes a framework integrating multi-level semantics from the global texture feature–based method, the BoVW model, and a pre-trained convolutional neural network (CNN). In DLGFF, two different approaches are proposed, i.e., the local and global features fused with the pooling-stretched convolutional features (LGCF) and the local and global features fused with the fully connected features (LGFF), to exploit the multi-level semantics for complex scenes. The experimental results obtained with three HSR image classification datasets confirm the effectiveness of the proposed DLGFF framework. Compared with the published results of the previous scene classification methods, the classification accuracies of the DLGFF framework on the 21-class UC Merced dataset and 12-class Google dataset of SIRI-WHU can reach 99.76%, which is superior to the current state-of-the-art methods. The classification accuracy of the DLGFF framework on the 45-class NWPU-RESISC45 dataset, 96.37 ± 0.05%, is an increase of about 6% when compared with the current state-of-the-art methods. This indicates that the fusion of the global low-level feature, the local mid-level feature, and the deep high-level feature can provide a representative description for HSR imagery.

[1]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[2]  Ping Tang,et al.  Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[3]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Lei Guo,et al.  Remote Sensing Image Scene Classification Using Bag of Convolutional Features , 2017, IEEE Geoscience and Remote Sensing Letters.

[6]  Thomas Blaschke,et al.  A comparison of three image-object methods for the multiscale analysis of landscape structure , 2003 .

[7]  Liangpei Zhang,et al.  Scene Classification Based on the Fully Sparse Semantic Topic Model , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Thomas Blaschke,et al.  Geographic Object-Based Image Analysis – Towards a new paradigm , 2014, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[9]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[10]  Jie Geng,et al.  Spectral–Spatial Classification of Hyperspectral Image Based on Deep Auto-Encoder , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[11]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[14]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Ming Cui,et al.  Scene classification based on multifeature probabilistic latent semantic analysis for high spatial resolution remote sensing images , 2015 .

[16]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[17]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Gui-Song Xia,et al.  Bag-of-Visual-Words Scene Classifier With Local and Global Features for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Geoscience and Remote Sensing Letters.

[21]  Gui-Song Xia,et al.  Dirichlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Tao Chen,et al.  Unsupervised Feature Learning for Land-Use Scene Recognition , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Julie Delon,et al.  Shape-based Invariant Texture Indexing , 2010, International Journal of Computer Vision.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[27]  Yingli Tian,et al.  Pyramid of Spatial Relatons for Scene-Level Land Use Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[28]  Jiwen Lu,et al.  Learning a Discriminative Distance Metric With Label Consistency for Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[30]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[31]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Cong Lin,et al.  Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Antonio J. Plaza,et al.  Adaptive Deep Pyramid Matching for Remote Sensing Scene Classification , 2016, ArXiv.

[34]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Chao Huang,et al.  Scene Classification via Triplet Networks , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[36]  Jie Wang,et al.  Transferring Pre-Trained Deep CNNs for Remote Scene Classification with General Features Learned from Linear PCA Network , 2017, Remote. Sens..

[37]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[38]  Mihai Datcu,et al.  Bridging the Semantic Gap for Satellite Image Annotation and Automatic Mapping Applications , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[39]  Zhiwu Lu,et al.  Zero-Shot Scene Classification for High Spatial Resolution Remote Sensing Images , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[40]  Yuliya Tarabalka,et al.  Best Merge Region-Growing Segmentation With Integrated Nonadjacent Region Object Aggregation , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[41]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[42]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Bei Zhao,et al.  Scene classification via latent Dirichlet allocation using a hybrid generative/discriminative strategy for high spatial resolution remote sensing imagery , 2013 .

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45]  Yanfei Zhong,et al.  A spectral–structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery , 2016 .

[46]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Luisa Verdoliva,et al.  Land Use Classification in Remote Sensing Images by Convolutional Neural Networks , 2015, ArXiv.