Descriptor extraction based on a multilayer dictionary architecture for classification of natural images

Abstract This paper presents a descriptor extraction method in the context of image classification, based on a multilayer structure of dictionaries. We propose to learn an architecture of discriminative dictionaries for classification in a supervised framework using a patch-level approach. This method combines many layers of sparse coding and pooling in order to reduce the dimension of the problem. The supervised learning of dictionary atoms allows them to be specialized for a classification task. The method has been tested on known datasets of natural images such as MNIST, CIFAR-10 and STL, in various conditions, especially when the size of the training set is limited, and in a transfer learning application. The results are also compared with those obtained with Convolutional Neural Network (CNN) of similar complexity in terms of number of layers and processing pipeline.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[3]  J. Andrew Bagnell,et al.  Differential Sparse Coding , 2008 .

[4]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[5]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[6]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[7]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[10]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[14]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Denis Pellerin,et al.  Rejection-based classification for action recognition using a spatio-temporal dictionary , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[17]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[18]  Rama Chellappa,et al.  Sparse dictionary-based representation and recognition of action attributes , 2011, 2011 International Conference on Computer Vision.

[19]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[22]  Mayank Vatsa,et al.  Deep Dictionary Learning , 2016, IEEE Access.

[23]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[24]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[25]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[29]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[30]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[31]  H. T. Kung,et al.  Stable and Efficient Representation Learning with Nonnegativity Constraints , 2014, ICML.

[32]  Pascal Frossard,et al.  Dictionary Learning for Fast Classification Based on Soft-thresholding , 2014, International Journal of Computer Vision.

[33]  E HintonGeoffrey,et al.  ImageNet classification with deep convolutional neural networks , 2017 .

[34]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[35]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.