One step beyond bags of features: Visual categorization using components

The bag-of-visual-words (BoW) representation has received wide application and public acceptance for visual categorization. However, the histogram based image representation ignores the spatial information and correlations among visual words. To tackle these problems, in this paper, we propose to use some image regions called ‘components’, as the higher-level visual elements to represent an image associating with the lower-level elements of ‘visual words’. Then we formulate the task of visual categorization into two progressive relationships among a given concept and the two-level visual elements of images, i.e., visual-words-to-components and components-to-concept. Firstly, component level linear SVM classifiers are learned to model the relationship between visual words and components, then the output of these SVM classifiers are linearly combined to model the relationships between components and concept. Experiments on the Scene-15 dataset and the Oxford Flowers dataset demonstrate the effectiveness of the proposed method.

[1]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jiebo Luo,et al.  Heterogeneous feature machines for visual recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Hai Jin,et al.  Label to region by bi-layer sparsity priors , 2009, MM '09.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Xiaoqin Zhang,et al.  Use bin-ratio information for category and scene classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[13]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.