Visual Word Ambiguity

This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.

[1]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[2]  Nuno Vasconcelos,et al.  A unifying view of image similarity , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[8]  Aleksandra Mojsilovic,et al.  Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues , 2004, International Journal of Computer Vision.

[9]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[10]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[14]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[15]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[16]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[19]  Jiebo Luo,et al.  Factor Graphs for Region-based Whole-scene Classification , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[20]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[21]  Bernt Schiele,et al.  Multiple Object Class Detection with a Generative Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Arnold W. M. Smeulders,et al.  The Distribution Family of Similarity Distances , 2007, NIPS.

[24]  Jean-Marc Odobez,et al.  A Thousand Words in a Scene , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Antonio Torralba,et al.  Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.

[26]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[27]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[29]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[30]  Mubarak Shah,et al.  Scene Modeling Using Co-Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Rong Jin,et al.  Discriminative Cluster Refinement: Improving Object Category Recognition Given Limited Training Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[34]  Bill Triggs,et al.  Multilevel Image Coding with Hyperfeatures , 2008, International Journal of Computer Vision.

[35]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[42]  Tsuhan Chen,et al.  Learning class-specific affinities for image labelling , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.