Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model

A new approach to improve image representation for category level classification.We encode pairwise relative spatial information of patches in the bag of word model.A simple approach complementary to the Spatial Pyramid Representation (SPR).Can be combined with SPR and outperforms other existing spatial methods.Experimental validation of the approach is shown on 5 challenging datasets. In the context of category level scene classification, the bag-of-visual-words model (BoVW) is widely used for image representation. This model is appearance based and does not contain any information regarding the arrangement of the visual words in the 2D image space. To overcome this problem, recent approaches try to capture information about either the absolute or the relative spatial location of visual words. In the first category, the so-called Spatial Pyramid Representation (SPR) is very popular thanks to its simplicity and good results. Alternatively, adding information about occurrences of relative spatial configurations of visual words was proven to be effective but at the cost of higher computational complexity, specifically when relative distance and angles are taken into account. In this paper, we introduce a novel way to incorporate both distance and angle information in the BoVW representation. The novelty is first to provide a computationally efficient representation adding relative spatial information between visual words and second to use a soft pairwise voting scheme based on the distance in the descriptor space. Experiments on challenging data sets MSRC-2, 15Scene, Caltech101, Caltech256 and Pascal VOC 2007 demonstrate that our method outperforms or is competitive with concurrent ones. We also show that it provides important complementary information to the spatial pyramid matching and can improve the overall performance.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Edmond Zhang,et al.  Improving Bag-of-Words model with spatial information , 2010, 2010 25th International Conference of Image and Vision Computing New Zealand.

[3]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[7]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Frédéric Jurie,et al.  Visual word disambiguation by semantic contexts , 2011, 2011 International Conference on Computer Vision.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Yasuo Kuniyoshi,et al.  Improving Local Descriptors by Embedding Global and Local Spatial Information , 2010, ECCV.

[12]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[14]  Sangkyum Kim,et al.  DisIClass: discriminative frequent pattern-based image classification , 2010, MDMKDD '10.

[15]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[16]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[18]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[19]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[20]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[21]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[22]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[23]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[25]  Jiajun Wang,et al.  Spatial context for visual vocabulary construction , 2010, 2010 International Conference on Image Analysis and Signal Processing.

[26]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[27]  Shin'ichi Satoh,et al.  Building Compact Local Pairwise Codebook with Joint Feature Space Clustering , 2010, ECCV.

[28]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[29]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[33]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[35]  Cécile Barat,et al.  Spatial orientations of visual word pairs to improve Bag-of-Visual-Words model , 2012, BMVC.

[36]  Zhen Li,et al.  Hierarchical Gaussianization for image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Fahad Shahbaz Khan,et al.  Discriminative compact pyramids for object and scene recognition , 2012, Pattern Recognition.

[38]  Pierre Tirilly,et al.  Language modeling for bag-of-visual words image categorization , 2008, CIVR '08.

[39]  Devi Parikh Recognizing jumbled images: The role of local and global information in image classification , 2011, 2011 International Conference on Computer Vision.

[40]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[43]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Xiangyang Xue,et al.  Incorporating Spatial Correlogram into Bag-of-Features Model for Scene Categorization , 2009, ACCV.

[45]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[46]  Alberto Del Bimbo,et al.  Local Pyramidal Descriptors for Image Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Nenghai Yu,et al.  Visual language modeling for image classification , 2007, MIR '07.

[48]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  David G. Lowe,et al.  Spatially Local Coding for Object Recognition , 2012, ACCV.

[50]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[51]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[52]  Matthieu Cord,et al.  Pooling in image representation: The visual codeword point of view , 2013, Comput. Vis. Image Underst..

[53]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[54]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[55]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[56]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[57]  N. H. C. Yung,et al.  Scene categorization via contextual visual words , 2010, Pattern Recognit..

[58]  Thomas Deselaers,et al.  Global and efficient self-similarity for object classification and detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Ming Yang,et al.  From frequent itemsets to semantically meaningful visual patterns , 2007, KDD '07.

[60]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.