Improving Image Classification Using Semantic Attributes

The Bag-of-Words (BoW) model—commonly used for image classification—has two strong limitations: on one hand, visual words are lacking of explicit meanings, on the other hand, they are usually polysemous. This paper proposes to address these two limitations by introducing an intermediate representation based on the use of semantic attributes. Specifically, two different approaches are proposed. Both approaches consist in predicting a set of semantic attributes for the entire images as well as for local image regions, and in using these predictions to build the intermediate level features. Experiments on four challenging image databases (PASCAL VOC 2007, Scene-15, MSRCv2 and SUN-397) show that both approaches improve performance of the BoW model significantly. Moreover, their combination achieves the state-of-the-art results on several of these image databases.

[1]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[2]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[4]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[7]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[8]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[13]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[15]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[16]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[18]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Tsuhan Chen,et al.  Efficient Kernels for identifying unbounded-order spatial features , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Mubarak Shah,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Fahad Shahbaz Khan,et al.  Top-down color attention for object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Gang Wang,et al.  Learning image similarity from Flickr groups using Stochastic Intersection Kernel MAchines , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[31]  Florent Perronnin,et al.  Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Deepu Rajan,et al.  Embedding Visual Words into Concept Space for Action and Scene Recognition , 2010, BMVC.

[33]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[34]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[35]  Frédéric Jurie,et al.  Improving object classification using semantic attributes , 2010, BMVC.

[36]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Ivan Laptev,et al.  Improving bag-of-features action recognition with non-local cues , 2010, BMVC.

[39]  Wen Gao,et al.  Towards semantic embedding in visual vocabulary , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Shin'ichi Satoh,et al.  Building Compact Local Pairwise Codebook with Joint Feature Space Clustering , 2010, ECCV.

[41]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[42]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  Frédéric Jurie,et al.  Visual word disambiguation by semantic contexts , 2011, 2011 International Conference on Computer Vision.

[45]  Thomas Deselaers,et al.  Visual and semantic similarity in ImageNet , 2011, CVPR 2011.