Object Classification of Aerial Images With Bag-of-Visual Words

This letter presents a Bag-of-Visual Words (BOV) representation for object-based classification in land-use/cover mapping of high spatial resolution aerial photograph. The method is introduced to handle the special characteristics of aerial images, i.e., variability of spectral and spatial content. Specifically, patch detection and description are used to divide and represent various subregions of objects comprising multiple homogeneous components. Moreover, the BOV representation is constructed with the statistics of the occurrence of visual words, which are learned from the training data set. A combination of spectral and texture features is verified to be a satisfactory choice through the evaluations of various patch descriptors. Furthermore, a threshold-based method is employed to reduce the impact of outliers on classification in test data. Experiments based on aerial-image data set show that the proposed BOV representation yields better classification performance than the low-level features, such as the spectral and texture features.

[1]  B. S. Manjunath,et al.  Modeling and Detection of Geospatial Objects Using Texture Motifs , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[6]  Lei Zhu,et al.  Theory of keyblock-based image retrieval , 2002, TOIS.

[7]  Thomas Blaschke,et al.  Object-Based Image Analysis , 2008 .

[8]  R. Hall,et al.  Incorporating texture into classification of forest species composition from airborne multispectral images , 2000 .

[9]  Liangpei Zhang,et al.  Classification of High Spatial Resolution Imagery Using Improved Gaussian Markov Random-Field-Based Texture Features , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  P. Swain,et al.  Neural Network Approaches Versus Statistical Methods In Classification Of Multisource Remote Sensing Data , 1990 .

[15]  Jacob Goldberger,et al.  Urban-Area Segmentation Using Visual Words , 2009, IEEE Geoscience and Remote Sensing Letters.

[16]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.