Three-layer Spatial Sparse Coding for Image Classification

In this paper, we propose a three-layer spatial sparse coding (TSSC) for image classification, aiming at three objectives: naturally recognizing image categories without learning phase, naturally involving spatial configurations of images, and naturally counteracting the intra-class variances. The method begins by representing the test images in a spatial pyramid as the to-be-recovered signals, and taking all sampled image patches at multiple scales from the labeled images as the bases. Then, three sets of coefficients are involved into the cardinal sparse coding to get the TSSC, one to penalize spatial inconsistencies of the pyramid cells and the corresponding selected bases, one to guarantee the sparsity of selected images, and the other to guarantee the sparsity of selected categories. Finally, the test images are classified according to a simple image-to-category similarity defined on the coding coefficients. In experiments, we test our method on two publicly available datasets and achieve significantly more accurate results than the conventional sparse coding with only a modest increase in computational complexity.

[1]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[3]  Hong Cheng,et al.  Sparsity induced similarity measure for label propagation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  James M. Rehg,et al.  Where am I: Place instance and category recognition using spatial PACT , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[7]  Dengxin Dai,et al.  Discovering scene categories by information projection and cluster sampling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[9]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.