Pre-Trained AlexNet Architecture with Pyramid Pooling and Supervision for High Spatial Resolution Remote Sensing Image Scene Classification

The rapid development of high spatial resolution (HSR) remote sensing imagery techniques not only provide a considerable amount of datasets for scene classification tasks but also request an appropriate scene classification choice when facing with finite labeled samples. AlexNet, as a relatively simple convolutional neural network (CNN) architecture, has obtained great success in scene classification tasks and has been proven to be an excellent foundational hierarchical and automatic scene classification technique. However, current HSR remote sensing imagery scene classification datasets always have the characteristics of small quantities and simple categories, where the limited annotated labeling samples easily cause non-convergence. For HSR remote sensing imagery, multi-scale information of the same scenes can represent the scene semantics to a certain extent but lacks an efficient fusion expression manner. Meanwhile, the current pre-trained AlexNet architecture lacks a kind of appropriate supervision for enhancing the performance of this model, which easily causes overfitting. In this paper, an improved pre-trained AlexNet architecture named pre-trained AlexNet-SPP-SS has been proposed, which incorporates the scale pooling—spatial pyramid pooling (SPP) and side supervision (SS) to improve the above two situations. Extensive experimental results conducted on the UC Merced dataset and the Google Image dataset of SIRI-WHU have demonstrated that the proposed pre-trained AlexNet-SPP-SS model is superior to the original AlexNet architecture as well as the traditional scene classification methods.

[1]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Xueming Qian,et al.  Semantic Annotation of High-Resolution Satellite Images via Weakly Supervised Learning , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[6]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Ming Cui,et al.  Scene classification based on multifeature probabilistic latent semantic analysis for high spatial resolution remote sensing images , 2015 .

[9]  Wen Yang,et al.  High-resolution satellite scene classification using a sparse coding based multiple feature combination , 2012 .

[10]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Bo Du,et al.  Scene Classification via a Gradient Boosting Random Convolutional Network Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Yanfei Zhong,et al.  Large patch convolutional neural networks for the scene classification of high spatial resolution imagery , 2016 .

[13]  Gui-Song Xia,et al.  Dirichlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Brian P. Salmon,et al.  Multiview Deep Learning for Land-Use Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[15]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[16]  Shihong Du,et al.  A Linear Dirichlet Mixture Model for decomposing scenes: Application to analyzing urban functional zonings , 2015 .

[17]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[18]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[19]  Bo Du,et al.  Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art , 2016, IEEE Geoscience and Remote Sensing Magazine.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Hong Sun,et al.  Unsupervised Feature Learning Via Spectral Clustering of Multidimensional Patches for Remotely Sensed Scene Classification , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[22]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Uwe Stilla,et al.  Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[24]  Liangpei Zhang,et al.  High-Resolution Image Classification Integrating Spectral-Spatial-Location Cues by Conditional Random Fields , 2016, IEEE Transactions on Image Processing.

[25]  Yanfei Zhong,et al.  A spectral–structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery , 2016 .

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[30]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[31]  Shihong Du,et al.  Semantic Classification of Heterogeneous Urban Scenes Using Intrascene Feature Similarity and Interscene Semantic Dependency , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[32]  Shih-Fu Chang,et al.  Fast kernel learning for spatial pyramid matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Bei Zhao,et al.  Scene classification based on a hierarchical convolutional sparse auto-encoder for high spatial resolution imagery , 2017 .

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Yingli Tian,et al.  Pyramid of Spatial Relatons for Scene-Level Land Use Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[38]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[39]  Zhang Liangpei,et al.  Spatial-Spectral Unsupervised Convolutional Sparse Auto-Encoder Classifier for Hyperspectral Imagery , 2017 .

[40]  Luisa Verdoliva,et al.  Land Use Classification in Remote Sensing Images by Convolutional Neural Networks , 2015, ArXiv.

[41]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Bei Zhao,et al.  Scene classification via latent Dirichlet allocation using a hybrid generative/discriminative strategy for high spatial resolution remote sensing imagery , 2013 .

[44]  Lin Zhao,et al.  Blind spectral unmixing based on sparse component analysis for hyperspectral remote sensing imagery , 2016 .