Large patch convolutional neural networks for the scene classification of high spatial resolution imagery

Abstract. The increase of the spatial resolution of remote-sensing sensors helps to capture the abundant details related to the semantics of surface objects. However, it is difficult for the popular object-oriented classification approaches to acquire higher level semantics from the high spatial resolution remote-sensing (HSR-RS) images, which is often referred to as the “semantic gap.” Instead of designing sophisticated operators, convolutional neural networks (CNNs), a typical deep learning method, can automatically discover intrinsic feature descriptors from a large number of input images to bridge the semantic gap. Due to the small data volume of the available HSR-RS scene datasets, which is far away from that of the natural scene datasets, there have been few reports of CNN approaches for HSR-RS image scene classifications. We propose a practical CNN architecture for HSR-RS scene classification, named the large patch convolutional neural network (LPCNN). The large patch sampling is used to generate hundreds of possible scene patches for the feature learning, and a global average pooling layer is used to replace the fully connected network as the classifier, which can greatly reduce the total parameters. The experiments confirm that the proposed LPCNN can learn effective local features to form an effective representation for different land-use scenes, and can achieve a performance that is comparable to the state-of-the-art on public HSR-RS scene datasets.

[1]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[2]  N. H. C. Yung,et al.  Scene categorization via contextual visual words , 2010, Pattern Recognit..

[3]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Anil M. Cheriyadat Aerial scene recognition using efficient sparse representation , 2012, ICVGIP '12.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[8]  Wen Yang,et al.  High-resolution satellite scene classification using a sparse coding based multiple feature combination , 2012 .

[9]  Selim Aksoy,et al.  Scene Classification Using Bag-of-Regions Representations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Gui-Song Xia,et al.  Dirichlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Junwei Han,et al.  Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA , 2013 .

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[17]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[18]  Gui-Song Xia,et al.  Deep Learning for Remote Sensing Image Understanding , 2016, J. Sensors.

[19]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[20]  Guoyou Wang,et al.  High resolution remote sensing image classification with multiple classifiers based on mixed combining rule , 2007, International Symposium on Multispectral Image Processing and Pattern Recognition.

[21]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[22]  Lu Wang,et al.  Land-use scene classification using multi-scale completed local binary patterns , 2015, Signal, Image and Video Processing.

[23]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[24]  Yanfei Zhong,et al.  A spectral–structural bag-of-features scene classifier for very high spatial resolution remote sensing imagery , 2016 .

[25]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[26]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[27]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[28]  Xiwang Zhang,et al.  Research on classification of high-resolution remote sensing image , 2007, International Symposium on Multispectral Image Processing and Pattern Recognition.

[29]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[30]  Mihai Datcu,et al.  Latent Dirichlet Allocation for Spatial Analysis of Satellite Images , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Dewen Hu,et al.  Scene classification using a multi-resolution bag-of-features model , 2013, Pattern Recognit..

[32]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[33]  Gang Liu,et al.  A Hierarchical Scheme of Multiple Feature Fusion for High-Resolution Satellite Scene Categorization , 2013, ICVS.

[34]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[38]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[39]  Hong Sun,et al.  Unsupervised Feature Learning Via Spectral Clustering of Multidimensional Patches for Remotely Sensed Scene Classification , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[40]  Luisa Verdoliva,et al.  Land Use Classification in Remote Sensing Images by Convolutional Neural Networks , 2015, ArXiv.

[41]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Jianzhong Guo,et al.  Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification , 2015, 2015 IEEE International Conference on Multimedia Big Data.

[43]  Robert Marti,et al.  Which is the best way to organize/classify images by content? , 2007, Image Vis. Comput..

[44]  Thomas Blaschke,et al.  Object based image analysis for remote sensing , 2010 .

[45]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[46]  Xin Chen,et al.  Classification of high spatial resolution remote sensing image using SVM and local spatial statistics Getis-Ord Gi , 2011, International Symposium on Multispectral Image Processing and Pattern Recognition.

[47]  Liangpei Zhang,et al.  The Fisher Kernel Coding Framework for High Spatial Resolution Scene Classification , 2016, Remote. Sens..

[48]  Jiebo Luo,et al.  Improved scene classification using efficient low-level features and semantic cues , 2004, Pattern Recognit..

[49]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[50]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..