When Dictionary Learning Meets Deep Learning: Deep Dictionary Learning and Coding Network for Image Recognition With Limited Data

We present a new deep dictionary learning and coding network (DDLCN) for image-recognition tasks with limited data. The proposed DDLCN has most of the standard deep learning layers (e.g., input/output, pooling, and fully connected), but the fundamental convolutional layers are replaced by our proposed compound dictionary learning and coding layers. The dictionary learning learns an overcomplete dictionary for input training data. At the deep coding layer, a locality constraint is added to guarantee that the activated dictionary bases are close to each other. Then, the activated dictionary atoms are assembled and passed to the compound dictionary learning and coding layers. In this way, the activated atoms in the first layer can be represented by the deeper atoms in the second dictionary. Intuitively, the second dictionary is designed to learn the fine-grained components shared among the input dictionary atoms; thus, a more informative and discriminative low-level representation of the dictionary atoms can be obtained. We empirically compare DDLCN with several leading dictionary learning methods and deep learning models. Experimental results on five popular data sets show that DDLCN achieves competitive results compared with state-of-the-art methods when the training data are limited. Code is available at https://github.com/Ha0Tang/DDLCN.

[1]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Shuicheng Yan,et al.  Jointly Learning Structured Analysis Discriminative Dictionary and Analysis Multiclass Classifier , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Shenghuo Zhu,et al.  Deep Coding Network , 2010, NIPS.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[8]  Ajmal S. Mian,et al.  Joint Discriminative Bayesian Dictionary and Classifier Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[10]  Yang Liu,et al.  Dictionary Learning Inspired Deep Network for Scene Recognition , 2018, AAAI.

[11]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  M. Ahmadi,et al.  Local gradient-based illumination invariant face recognition using local phase quantisation and multi-resolution local binary pattern fusion , 2015, IET Image Process..

[13]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[14]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Kenneth W. Shum,et al.  Deep Representation Learning with Target Coding , 2015, AAAI.

[17]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[19]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Xin Yuan,et al.  A Generative Model for Deep Convolutional Learning , 2015, ICLR.

[21]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[23]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[25]  Richa Singh,et al.  Greedy Deep Dictionary Learning , 2016, ArXiv.

[26]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[27]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[28]  Hong Liu,et al.  Sequential Bag-of-Words model for human action classification , 2016, CAAI Trans. Intell. Technol..

[29]  Liyi Dai,et al.  Deep Dictionary Learning: A PARametric NETwork Approach , 2018, IEEE Transactions on Image Processing.

[30]  Tong Zhang,et al.  Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[31]  Seungryong Kim,et al.  Modality-Invariant Image Classification Based on Modality Uniqueness and Dictionary Learning , 2017, IEEE Transactions on Image Processing.

[32]  Jeffrey A. Fessler,et al.  Convolutional Dictionary Learning: Acceleration and Convergence , 2017, IEEE Transactions on Image Processing.

[33]  Giorgio Metta,et al.  Ask the Image: Supervised Pooling to Preserve Feature Locality , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Huaping Liu,et al.  Two-Layers Local Coordinate Coding , 2015, CCCV.

[35]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[36]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[37]  Ajmal S. Mian,et al.  Discriminative Bayesian Dictionary Learning for Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Eugenio Culurciello,et al.  Convolutional Clustering for Unsupervised Learning , 2015, ArXiv.

[39]  Nicu Sebe,et al.  Deep Micro-Dictionary Learning and Coding Network , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[41]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Guangming Shi,et al.  Learning Hybrid Sparsity Prior for Image Restoration: Where Deep Learning Meets Sparse Coding , 2018, ArXiv.

[43]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[44]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[45]  Donghui Wang,et al.  A classification-oriented dictionary learning model: Explicitly learning the particularity and commonality across categories , 2014, Pattern Recognit..

[46]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[47]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[48]  Zuowei Shen,et al.  Dictionary Learning for Sparse Coding: Algorithms and Convergence Analysis , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[50]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Yongdong Zhang,et al.  STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[52]  David Zhang,et al.  Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification , 2014, International Journal of Computer Vision.

[53]  Guangming Shi,et al.  Multi-layer discriminative dictionary learning with locality constraint for image classification , 2019, Pattern Recognit..

[54]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[55]  Hong Liu,et al.  Gender Classification Using Pyramid Segmentation for Unconstrained Back-facing Video Sequences , 2015, ACM Multimedia.

[56]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Hong Liu,et al.  A Novel Feature Matching Strategy for Large Scale Image Retrieval , 2016, IJCAI.

[58]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[59]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[60]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[61]  Michael Elad,et al.  Structure-aware classification using supervised dictionary learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[62]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[63]  Yong Xu,et al.  Sparse Coding for Classification via Discrimination Ensemble , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[65]  Yap-Peng Tan,et al.  Nonlinear dictionary learning with application to image classification , 2018, Pattern Recognit..

[66]  Nicu Sebe,et al.  Fast and Robust Dynamic Hand Gesture Recognition via Key Frames Extraction and Feature Fusion , 2019, Neurocomputing.

[67]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[68]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[69]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[71]  J. Mairal Sparse coding for machine learning, image processing and computer vision , 2010 .

[72]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[73]  Xiao-Yuan Jing,et al.  Structured Discriminative Tensor Dictionary Learning for Unsupervised Domain Adaptation , 2019, ArXiv.

[74]  Rama Chellappa,et al.  DASH-N: Joint Hierarchical Domain Adaptation and Feature Learning , 2015, IEEE Transactions on Image Processing.

[75]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[76]  Qingming Huang,et al.  Multi-Level Discriminative Dictionary Learning With Application to Large Scale Image Classification , 2015, IEEE Transactions on Image Processing.

[77]  A. Martínez,et al.  The AR face databasae , 1998 .

[78]  Jian Yang,et al.  Regularized Robust Coding for Face Recognition , 2012, IEEE Transactions on Image Processing.