Sparse Spatial Coding: A Novel Approach to Visual Recognition

Successful image-based object recognition techniques have been constructed founded on powerful techniques such as sparse representation, in lieu of the popular vector quantization approach. However, one serious drawback of sparse space-based methods is that local features that are quite similar can be quantized into quite distinct visual words. We address this problem with a novel approach for object recognition, called sparse spatial coding, which efficiently combines a sparse coding dictionary learning and spatial constraint coding stage. We performed experimental evaluation using the Caltech 101, Caltech 256, Corel 5000, and Corel 10000 data sets, which were specifically designed for object recognition evaluation. Our results show that our approach achieves high accuracy comparable with the best single feature method previously published on those databases. Our method outperformed, for the same bases, several multiple feature methods, and provided equivalent, and in few cases, slightly less accurate results than other techniques specifically designed to that end. Finally, we report state-of-the-art results for scene recognition on COsy Localization Dataset (COLD) and high performance results on the MIT-67 indoor scene recognition, thus demonstrating the generalization of our approach for such tasks.

[1]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[3]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[5]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[6]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[7]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[8]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[9]  Vincent Lepetit,et al.  Are sparse representations really relevant for image classification? , 2011, CVPR 2011.

[10]  Fei-Fei Li,et al.  Large Margin Learning of Upstream Scene Understanding Models , 2010, NIPS.

[11]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  B. Caputo,et al.  Cold: the Cosy Localization Database Cold: the Cosy Localization Database , 2009 .

[13]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[15]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[16]  Antoine Bordes,et al.  Sequence Labelling SVMs Trained in One Pass , 2008, ECML/PKDD.

[17]  Barbara Caputo,et al.  Towards robust place recognition for robot localization , 2008, 2008 IEEE International Conference on Robotics and Automation.

[18]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[19]  Yasuo Kuniyoshi,et al.  Global Gaussian approach for scene categorization using information geometry , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Narendra Ahuja,et al.  Learning subcategory relevances for category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  John D. Lafferty,et al.  Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[22]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[23]  Michael Elad,et al.  Learning Multiscale Sparse Representations for Image and Video Restoration , 2007, Multiscale Model. Simul..

[24]  Zhiwu Lu,et al.  Image categorization by learning with context and consistency , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Barbara Caputo,et al.  Multi Kernel Learning with Online-Batch Optimization , 2012, J. Mach. Learn. Res..

[26]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Horst Bischof,et al.  Online multi-class LPBoost , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[33]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[34]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[35]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[36]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[37]  Michael Elad,et al.  Multi-Scale Dictionary Learning Using Wavelets , 2011, IEEE Journal of Selected Topics in Signal Processing.

[38]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[39]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[42]  Baoxin Li,et al.  Discriminative affine sparse codes for image classification , 2011, CVPR 2011.

[43]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[46]  Emmanuelle Frenoux,et al.  Temporal Bag-of-Words - A Generative Model for Visual Place Recognition using Temporal Integration , 2011, VISAPP.

[47]  Mario Fernando Montenegro Campos,et al.  Sparse Spatial Coding: A novel approach for efficient and accurate object recognition , 2012, 2012 IEEE International Conference on Robotics and Automation.

[48]  Jason Weston,et al.  Solving multiclass support vector machines with LaRank , 2007, ICML '07.

[49]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).