Image Classification Using Spatial Pyramid Coding and Visual Word Reweighting

The ignorance on spatial information and semantics of visual words becomes main obstacles in the bag-of-visual-words (BoW) method for image classification. To address the obstacles, we present an improved BoW representation using spatial pyramid coding (SPC) and visual word reweighting. In SPC procedure, we adopt the sparse coding technique to encode visual features with the spatial constraint. Visual features from the same spatial sub-region of images are collected to generate the visual vocabulary. Additionally, a relaxed but simple solution for semantic embedding into visual words is proposed. We relax the semantic embedding from ideal semantic correspondence to naive semantic purity of visual words, and reweight each visual word according to its semantic purity. Higher weights are given to semantically distinctive visual words, and lower weights to semantically general ones. Experiments on a public dataset demonstrate the effectiveness of the proposed method.

[1]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[7]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[8]  Hongping Cai,et al.  Learning weights for codebook in image classification and retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[10]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Xin Chen,et al.  Real-world visual search is dominated by top-down guidance , 2006, Vision Research.

[13]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[15]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[17]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[19]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[21]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[23]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[27]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[28]  Qi Tian,et al.  Category sensitive codebook construction for object category recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[29]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).