Scene semantic classification based on random-scale stretched convolutional neural network for high-spatial resolution remote sensing imagery

Convolutional neural network (CNN) has outstanding performance on nature image classification, such as facial recognition, ImageNet Large Scale Visual Recognition Challenge. However, due to scale variation of the same object in scene, it's difficult to directly utilize CNN for remote sensing image classification. In order to solve this problem, scene classification based on a random-scale stretched convolutional neural network (SRSCNN) for HSR remote sensing imagery is proposed in this paper. In the proposed method, the patches with random scale is cropped from image and stretched to the specified scale as input to train CNN, and in order to further improve the performance of CNN, the proposed method classifies an image multiple times to decide its label by voting. Experimental results using two datasets, i.e. the UC Merced dataset, Google Dataset of SIRI-WHU, show better performance than the traditional scene classification methods.

[1]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[2]  Liangpei Zhang,et al.  Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Bei Zhao,et al.  Scene classification via latent Dirichlet allocation using a hybrid generative/discriminative strategy for high spatial resolution remote sensing imagery , 2013 .

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ping Tang,et al.  A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification , 2014 .

[8]  Ping Tang,et al.  Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[9]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[10]  Bo Du,et al.  Scene Classification via a Gradient Boosting Random Convolutional Network Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[13]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.