Beyond SIFT for Image Categorization by Bag-of-Scenes Analysis

In this paper, we address the general problem of image/object categorization with a novel approach referred to as Bag-of-Scenes (BoS). Our approach is efficient for both low semantic applications, such as texture classification and higher semantic tasks such as natural scenes recognition. It is based on the widely used combination of (i) Sparse coding (Sc), (ii) Max-pooling and (iii) Spatial Pyramid Matching (SPM) techniques applied to histograms of multi-scale Local Binary/Ternary Patterns (LBP/LTP) as local features. This approach can be considered as a two-layer hierarchical architecture. The first layer encodes quickly the local spatial patch structure via histograms of LBP/LTP, while the second layer encodes the relationships between pre-analyzed LBP/LTP-scenes/objects. In order to provide comparative results, we also introduce an alternate 2-layer architecture. For this latter, the first layer is encoding directly the multi-scale Differential Vectors (DV) local patches instead of histograms of LBP/LTP. Our method outperforms SIFT-based approaches using Sc techniques and can be trained efficiently with a simple linear SVM. Our BoS method achieves \(87.46\,\%\), and \(90.35\,\%\) of accuracy for Scene-15, UIUC-Sport datasets respectively.

[1]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[2]  Thomas Deselaers,et al.  Global and efficient self-similarity for object classification and detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Hiromitsu Hattori,et al.  Learning From Humans: Agent Modeling With Individual Human Behaviors , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[4]  Shengcai Liao,et al.  Face Detection Based on Multi-Block LBP Representation , 2007, ICB.

[5]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[6]  Fahad Shahbaz Khan,et al.  Discriminative compact pyramids for object and scene recognition , 2012, Pattern Recognition.

[7]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[10]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[11]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[13]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Francesco Bianconi,et al.  On the Occurrence Probability of Local Binary Patterns: A Theoretical Study , 2011, Journal of Mathematical Imaging and Vision.

[16]  Wen Gao,et al.  Are Gabor phases really useless for face recognition? , 2009, Pattern Analysis and Applications.

[17]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[18]  James M. Rehg,et al.  Real-time human detection using contour cues , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Sébastien Marcel,et al.  On the Recent Use of Local Binary Patterns for Face Authentication , 2007 .

[20]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[21]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[22]  Baochang Zhang,et al.  Local Derivative Pattern Versus Local Binary Pattern: Face Recognition With High-Order Local Pattern Descriptor , 2010, IEEE Transactions on Image Processing.

[23]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Hervé Glotin,et al.  Pyramidal Multi-level Features for the Robot Vision@ICPR 2010 Challenge , 2010, 2010 20th International Conference on Pattern Recognition.

[25]  Shengcai Liao,et al.  Learning Multi-scale Block Local Binary Patterns for Face Recognition , 2007, ICB.

[26]  Di Huang,et al.  Local Binary Patterns and Its Application to Facial Image Analysis: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[29]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[30]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[31]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Frédéric Jurie,et al.  Face Recognition using Local Quantized Patterns , 2012, BMVC.

[33]  Matthieu Cord,et al.  BOSSA: Extended bow formalism for image classification , 2011, 2011 18th IEEE International Conference on Image Processing.

[34]  C. Schmid,et al.  Description of Interest Regions with Center-Symmetric Local Binary Patterns , 2006, ICVGIP.

[35]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[36]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Mario Fernando Montenegro Campos,et al.  Sparse Spatial Coding: A novel approach for efficient and accurate object recognition , 2012, 2012 IEEE International Conference on Robotics and Automation.

[38]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[39]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Francesco Bianconi,et al.  Automatic classification of granite tiles through colour and texture features , 2012, Expert Syst. Appl..

[41]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[43]  Guojun Lu,et al.  Texture classification using multimodal Invariant Local Binary Pattern , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[44]  Bill Triggs,et al.  Visual Recognition Using Local Quantized Patterns , 2012, ECCV.

[45]  Jeongnyeo Kim,et al.  Face Image Retrieval Using Sparse Representation Classifier with Gabor-LBP Histogram , 2010, WISA.

[46]  Matti Pietikäinen,et al.  IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, TPAMI-2008-09-0620 1 WLD: A Robust Local Image Descriptor , 2022 .

[47]  Thomas G. Dietterich,et al.  Stacked spatial-pyramid kernel: An object-class recognition method to combine scores from random trees , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[48]  Daijin Kim,et al.  Robust face detection using local gradient patterns and evidence accumulation , 2012, Pattern Recognit..

[49]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[50]  James M. Rehg,et al.  Where am I: Place instance and category recognition using spatial PACT , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Andreas Ernst,et al.  Face detection with the modified census transform , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[52]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[53]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[54]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[55]  Chunhua Shen,et al.  Effective Pedestrian Detection Using Center-symmetric Local Binary/Trinary Patterns , 2010, ArXiv.