Orthogonality-Promoting Dictionary Learning via Bayesian Inference

Dictionary Learning (DL) plays a crucial role in numerous machine learning tasks. It targets at finding the dictionary over which the training set admits a maximally sparse representation. Most existing DL algorithms are based on solving an optimization problem, where the noise variance and sparsity level should be known as the prior knowledge. However, in practice applications, it is difficult to obtain these knowledge. Thus, non-parametric Bayesian DL has recently received much attention of researchers due to its adaptability and effectiveness. Although many hierarchical priors have been used to promote the sparsity of the representation in non-parametric Bayesian DL, the problem of redundancy for the dictionary is still overlooked, which greatly decreases the performance of sparse coding. To address this problem, this paper presents a novel robust dictionary learning framework via Bayesian inference. In particular, we employ the orthogonality-promoting regularization to mitigate correlations among dictionary atoms. Such a regularization, encouraging the dictionary atoms to be close to being orthogonal, can alleviate overfitting to training data and improve the discrimination of the model. Moreover, we impose Scale mixture of the Vector variate Gaussian (SMVG) distribution on the noise to capture its structure. A Regularized Expectation Maximization Algorithm is developed to estimate the posterior distribution of the representation and dictionary with orthogonality-promoting regularization. Numerical results show that our method can learn the dictionary with an accuracy better than existing methods, especially when the number of training signals is limited.

[1]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[4]  G. McLachlan,et al.  Extensions of the EM Algorithm , 2007 .

[5]  Feiping Nie,et al.  Robust Dictionary Learning with Capped l1-Norm , 2015, IJCAI.

[6]  David B. Dunson,et al.  Deep Learning with Hierarchical Convolutional Factor Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Marc G. Genton,et al.  Full likelihood inference for max‐stable data , 2017, Stat.

[8]  Ying Tai,et al.  Nuclear Norm Based Matrix Regression with Applications to Face Recognition with Occlusion and Illumination Changes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[12]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[14]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xiaoli Z. Fern,et al.  Weakly Supervised Dictionary Learning , 2018, IEEE Transactions on Signal Processing.

[16]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[17]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[18]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[19]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[20]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[21]  Lin Chen,et al.  Discriminative Semi-Supervised Dictionary Learning with Entropy Regularization for Pattern Classification , 2017, AAAI.

[22]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[23]  Man Zhang,et al.  Discriminative Analysis Dictionary Learning , 2016, AAAI.

[24]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[25]  Ajmal S. Mian,et al.  Joint Discriminative Bayesian Dictionary and Classifier Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[27]  Jian Yang,et al.  Nonparametric Bayesian Correlated Group Regression With Applications to Image Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Feiping Nie,et al.  Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[31]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[32]  Aggelos K. Katsaggelos,et al.  Bayesian K-SVD Using Fast Variational Inference , 2017, IEEE Transactions on Image Processing.

[33]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Wei Wu,et al.  Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis , 2018, ICML.

[35]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Rama Chellappa,et al.  Generalized Domain-Adaptive Dictionaries , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ajmal S. Mian,et al.  Discriminative Bayesian Dictionary Learning for Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  David B. Dunson,et al.  Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images , 2012, IEEE Transactions on Image Processing.

[39]  Jun Fang,et al.  Sparse Bayesian dictionary learning with a Gaussian hierarchical model , 2017, Signal Process..