Two-Level Feature Representation for Aerial Scene Classification

Effective scene representation is a fundamental part of high-resolution scene classification systems. In this letter, we present a holistic scene representation method, i.e., the two-level feature representation (TLFR) model. The TLFR is composed of low-level and high-level features. Low-level features are obtained by computing the residual error between a local descriptor and its corresponding visual word, and the high-level features are obtained using a proposed selection-constrained sparse coding method. In addition, low-level features in a cluster are integrated by summation pooling, whereas high-level features are fused by maximization pooling. The holistic scene representation is finally generated by incorporating these two levels of features into the bag-of-visual-words framework. Experimental results show that the TLFR model is robust to translation and rotation variations and demonstrates promising performance with the Land Use and Land Cover Database data set and a newly released Singapore data set.

[1]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[2]  Shawn D. Newsam,et al.  Geographic Image Retrieval Using Local Invariant Features , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Yingli Tian,et al.  Pyramid of Spatial Relatons for Scene-Level Land Use Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[5]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[10]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[11]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Jon Atli Benediktsson,et al.  A Novel Automatic Change Detection Method for Urban High-Resolution Remotely Sensed Imagery Based on Multiindex Scene Representation , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).