Fusing two convolutional neural networks for high-resolution scene classification

This paper presents a novel deep convolutional feature fusion (ConvFF) approach for high-resolution scene classification, characterizing the well-known deep convolutional neural network (ConvNet) approach. The proposed ConvFF approach starts by generating an initial feature representation of the original scenes under exploration from two deep ConvNets pre-trained on two different large amount of labeled data. After the pre-training phase, we fine tune the two deep ConvNets consisting of mainly objects and scenes respectively in a supervised manner using the target training images. Then we propose to fuse the extracted two types of convolutional features provided by the last fully-connected (FC) layer, respectively. Finally, the fused convolutional features are fed as input to a SVM classifier for classification. The proposed method is evaluated by using two challenging high-resolution scene datasets. Experimental results show that the proposed method can effectively extract complementary features of the scenes and capture local spatial patterns, consistently outperforming several state-of-the-art methods.

[1]  Qian Du,et al.  Scene classification using local and global features with collaborative representation fusion , 2016, Inf. Sci..

[2]  Lu Wang,et al.  Land-use scene classification using multi-scale completed local binary patterns , 2015, Signal, Image and Video Processing.

[3]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[4]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[9]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Qian Du,et al.  Fusing Local and Global Features for High-Resolution Scene Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Qian Du,et al.  Remote Sensing Image Scene Classification Using Multi-Scale Completed Local Binary Patterns and Fisher Vectors , 2016, Remote. Sens..

[13]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[14]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Alan L. Yuille,et al.  Non-Rigid Point Set Registration by Preserving Global and Local Structures , 2016, IEEE Transactions on Image Processing.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Bo Du,et al.  Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art , 2016, IEEE Geoscience and Remote Sensing Magazine.