Efficient Bag of Scenes Analysis for Image Categorization

In this paper, we address the general problem of image/object categorization with a novel approach referred to as Bag-of-Scenes (BoS).Our approach is efficient for low semantic applications such as texture classification as well as for higher semantic tasks such as natural scenes recognition or fine-grained visual categorization (FGVC). It is based on the widely used combination of i) Sparse coding (Sc), ii) Max-pooling and iii) Spatial Pyramid Matching (SPM) techniques applied to histograms of multi-scale Local Binary/Ternary Patterns (LBP/LTP) and its improved variants. This approach can be considered as a two-layer hierarchical architecture: the first layer encodes the local spatial patch structure via histograms of LBP/LTP while the second encodes the relationships between pre-analyzed LBP/LTP-scenes/objects. Our method outperforms SIFT-based approaches using Sc techniques and can be trained efficiently with a simple linear SVM.

[1]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[2]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[3]  Sébastien Marcel,et al.  On the Recent Use of Local Binary Patterns for Face Authentication , 2007 .

[4]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[5]  Jeongnyeo Kim,et al.  Face Image Retrieval Using Sparse Representation Classifier with Gabor-LBP Histogram , 2010, WISA.

[6]  Matti Pietikäinen,et al.  IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, TPAMI-2008-09-0620 1 WLD: A Robust Local Image Descriptor , 2022 .

[7]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[9]  Jonghyun Choi,et al.  A complementary local feature descriptor for face identification , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[10]  Shengcai Liao,et al.  Face Detection Based on Multi-Block LBP Representation , 2007, ICB.

[11]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[13]  Andreas Ernst,et al.  Face detection with the modified census transform , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[14]  Fahad Shahbaz Khan,et al.  Portmanteau Vocabularies for Multi-Cue Image Representation , 2011, NIPS.

[15]  Hervé Glotin,et al.  Pyramidal Multi-level Features for the Robot Vision@ICPR 2010 Challenge , 2010, 2010 20th International Conference on Pattern Recognition.

[16]  Francesco Bianconi,et al.  Automatic classification of granite tiles through colour and texture features , 2012, Expert Syst. Appl..

[17]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[19]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[20]  Andrew Zisserman,et al.  BiCoS: A Bi-level co-segmentation method for image classification , 2011, 2011 International Conference on Computer Vision.

[21]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[22]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[23]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[24]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Thomas Deselaers,et al.  Global and efficient self-similarity for object classification and detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Francesco Bianconi,et al.  On the Occurrence Probability of Local Binary Patterns: A Theoretical Study , 2011, Journal of Mathematical Imaging and Vision.

[28]  James M. Rehg,et al.  Real-time human detection using contour cues , 2011, 2011 IEEE International Conference on Robotics and Automation.

[29]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Baochang Zhang,et al.  Local Derivative Pattern Versus Local Binary Pattern: Face Recognition With High-Order Local Pattern Descriptor , 2010, IEEE Transactions on Image Processing.

[34]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Di Huang,et al.  Local Binary Patterns and Its Application to Facial Image Analysis: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[36]  Thomas G. Dietterich,et al.  Stacked spatial-pyramid kernel: An object-class recognition method to combine scores from random trees , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[37]  Daijin Kim,et al.  Robust face detection using local gradient patterns and evidence accumulation , 2012, Pattern Recognit..

[38]  Fahad Shahbaz Khan,et al.  Discriminative compact pyramids for object and scene recognition , 2012, Pattern Recognition.

[39]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[40]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[41]  Shengcai Liao,et al.  Learning Multi-scale Block Local Binary Patterns for Face Recognition , 2007, ICB.

[42]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[44]  C. Schmid,et al.  Description of Interest Regions with Center-Symmetric Local Binary Patterns , 2006, ICVGIP.

[45]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[46]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[47]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[48]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[50]  Chunhua Shen,et al.  Effective Pedestrian Detection Using Center-symmetric Local Binary/Trinary Patterns , 2010, ArXiv.

[51]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[52]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[53]  Guojun Lu,et al.  Texture classification using multimodal Invariant Local Binary Pattern , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[54]  Bill Triggs,et al.  Visual Recognition Using Local Quantized Patterns , 2012, ECCV.

[55]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[56]  Wen Gao,et al.  Are Gabor phases really useless for face recognition? , 2009, Pattern Analysis and Applications.

[57]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[58]  Narendra Ahuja,et al.  Learning subcategory relevances for category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  James M. Rehg,et al.  Where am I: Place instance and category recognition using spatial PACT , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Mario Fernando Montenegro Campos,et al.  Sparse Spatial Coding: A novel approach for efficient and accurate object recognition , 2012, 2012 IEEE International Conference on Robotics and Automation.

[61]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[62]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[63]  Matthieu Cord,et al.  BOSSA: Extended bow formalism for image classification , 2011, 2011 18th IEEE International Conference on Image Processing.

[64]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.