Remote Sensing Image Scene Classification Using Rearranged Local Features

Remote sensing image scene classification is a fundamental problem, which aims to label an image with a specific semantic category automatically. Recently, deep learning methods have achieved competitive performance for remote sensing image scene classification, especially the methods based on a convolutional neural network (CNN). However, most of the existing CNN methods only use feature vectors of the last fully connected layer. They give more importance to global information and ignore local information of images. It is common that some images belong to different categories, although they own similar global features. The reason is that the category of an image may be highly related to local features, other than the global feature. To address this problem, a method based on rearranged local features is proposed in this paper. First, outputs of the last convolutional layer and the last fully connected layer are employed to depict the local and global information, respectively. After that, the remote sensing images are clustered to several collections using their global features. For each collection, local features of an image are rearranged according to their similarities with local features of the cluster center. In addition, a fusion strategy is proposed to combine global and local features for enhancing the image representation. The proposed method surpasses the state of the arts on four public and challenging data sets: UC-Merced, WHU-RS19, Sydney, and AID.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Luisa Verdoliva,et al.  Land Use Classification in Remote Sensing Images by Convolutional Neural Networks , 2015, ArXiv.

[3]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[4]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[5]  David Picard,et al.  Evaluation of second-order visual features for land-use classification , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Wen Yang,et al.  STRUCTURAL HIGH-RESOLUTION SATELLITE IMAGE INDEXING , 2010 .

[8]  Retno Kusumaningrum,et al.  Integrated visual vocabulary in latent Dirichlet allocation–based scene classification for IKONOS image , 2014 .

[9]  周达标 Zhou Da-biao,et al.  A destriping method with multi-scale variational model for remote sensing images , 2017 .

[10]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[11]  Shawn D. Newsam,et al.  Geographic Image Retrieval Using Local Invariant Features , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Hong Sun,et al.  Tile-Level Annotation of Satellite Images Using Multi-Level Max-Margin Discriminative Random Field , 2013, Remote. Sens..

[13]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[14]  Naif Alajlan,et al.  Using convolutional features and a sparse autoencoder for land-use scene classification , 2016 .

[15]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[17]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[19]  Liangpei Zhang,et al.  Hybrid generative/discriminative scene classification strategy based on latent dirichlet allocation for high spatial resolution remote sensing imagery , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[20]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[22]  Shawn D. Newsam,et al.  Comparing SIFT descriptors and gabor texture features for classification of remote sensed imagery , 2008, 2008 15th IEEE International Conference on Image Processing.

[23]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[25]  Hongxun Yao,et al.  Deep Feature Fusion for VHR Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[28]  Bo Du,et al.  Scene Classification via a Gradient Boosting Random Convolutional Network Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Vladimir Risojevic,et al.  Aerial image classification using structural texture similarity , 2011, 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[30]  Jefersson Alex dos Santos,et al.  Evaluating the Potential of Texture and Color Descriptors for Remote Sensing Image Retrieval and Classification , 2010, VISAPP.

[31]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[32]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[33]  Xiangtao Zheng,et al.  Remote Sensing Scene Classification by Unsupervised Representation Learning , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[35]  Qian Song,et al.  Exploring the Use of Google Earth Imagery and Object-Based Methods in Land Use/Cover Mapping , 2013, Remote. Sens..

[36]  Ming Cui,et al.  Scene classification based on multifeature probabilistic latent semantic analysis for high spatial resolution remote sensing images , 2015 .

[37]  Hong Huo,et al.  Local feature representation based on linear filtering with feature pooling and divisive normalization for remote sensing image classification , 2017 .

[38]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[39]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Brian P. Salmon,et al.  Multiview Deep Learning for Land-Use Classification , 2015, IEEE Geoscience and Remote Sensing Letters.

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[45]  Ping Tang,et al.  A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification , 2014 .

[46]  Ping Tang,et al.  Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[47]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[48]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[49]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[50]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[51]  Tong Zhang,et al.  Deep Learning Based Feature Selection for Remote Sensing Scene Classification , 2015, IEEE Geoscience and Remote Sensing Letters.