Evaluation of second-order visual features for land-use classification

This paper investigates the use of recent visual features based on second-order statistics, as well as new processing techniques to improve the quality of features. More specifically, we present and evaluate Fisher Vectors (FV), Vectors of Locally Aggregated Descriptors (VLAD), and Vectors of Locally Aggregated Tensors (VLAT). These techniques are combined with several normalization techniques, such as power law normalization and orthogonalisation/whitening of descriptor spaces. Results on the UC Merced land use dataset shows the relevance of these new methods for land-use classification, as well as a significant improvement over Bag-of-Words.

[1]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[2]  Anil M. Cheriyadat,et al.  Unsupervised Feature Learning for Aerial Scene Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[4]  David Picard,et al.  Efficient image signatures and similarities using tensor products of local descriptors , 2013, Comput. Vis. Image Underst..

[5]  Erchan Aptoula,et al.  Remote Sensing Image Retrieval With Global Morphological Texture Descriptors , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Xian Sun,et al.  High-Resolution Remote-Sensing Image Classification via an Approximate Earth Mover's Distance-Based Bag-of-Features Model , 2013, IEEE Geoscience and Remote Sensing Letters.

[7]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Matthieu Cord,et al.  Classification of Urban Scenes from Geo-referenced Images in Urban Street-View Context , 2012, 2012 11th International Conference on Machine Learning and Applications.

[11]  Z. Babic,et al.  Orientation difference descriptor for aerial image classification , 2012, 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP).

[12]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[14]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[16]  David Picard,et al.  Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors , 2013, IEEE MultiMedia.

[17]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.