An information theoretic formulation of the Dictionary Learning and Sparse Coding Problems on Statistical Manifolds

In this work, we propose a novel information theoretic framework for dictionary learning (DL) and sparse coding (SC) on a statistical manifold (the manifold of probability distributions). Unlike the traditional DL and SC framework, our new formulation {\it does not explicitly incorporate any sparsity inducing norm in the cost function but yet yields SCs}. Moreover, we extend this framework to the manifold of symmetric positive definite matrices, $\mathcal{P}_n$. Our algorithm approximates the data points, which are probability distributions, by the weighted Kullback-Leibeler center (KL-center) of the dictionary atoms. The KL-center is the minimizer of the maximum KL-divergence between the unknown center and members of the set whose center is being sought. Further, {\it we proved that this KL-center is a sparse combination of the dictionary atoms}. Since, the data reside on a statistical manifold, the data fidelity term can not be as simple as in the case of the vector-space data. We therefore employ the geodesic distance between the data and a sparse approximation of the data element. This cost function is minimized using an acceleterated gradient descent algorithm. An extensive set of experimental results show the effectiveness of our proposed framework. We present several experiments involving a variety of classification problems in Computer Vision applications. Further, we demonstrate the performance of our algorithm by comparing it to several state-of-the-art methods both in terms of classification accuracy and sparsity.

[1]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[2]  Anoop Cherian,et al.  Riemannian Sparse Coding for Positive Definite Matrices , 2014, ECCV.

[3]  John Wright,et al.  Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[4]  Jean-Luc Starck,et al.  Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit , 2012, IEEE Transactions on Information Theory.

[5]  Søren Hauberg,et al.  Geodesic exponential kernels: When curvature and linearity conflict , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lei Zhang,et al.  Log-Euclidean Kernels for Sparse Representation and Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Mashbat Suzuki,et al.  Information Geometry and Statistical Manifold , 2014 .

[8]  Anoop Cherian,et al.  Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval , 2011, ECML/PKDD.

[9]  Calyampudi R. Rao,et al.  Chapter 3: Differential and Integral Geometry in Statistical Inference , 1987 .

[10]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[11]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Saman A. Zonouz,et al.  CloudID: Trustworthy cloud-based and cross-enterprise biometric identification , 2015, Expert Syst. Appl..

[13]  Michael Elad,et al.  K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[14]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[16]  P. Thomas Fletcher,et al.  Riemannian geometry for the statistical analysis of diffusion tensor data , 2007, Signal Process..

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[19]  Mehrtash Tafazzoli Harandi,et al.  Riemannian coding and dictionary learning: Kernels to the rescue , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Baba C. Vemuri,et al.  On A Nonlinear Generalization of Sparse Coding and Dictionary Learning , 2013, ICML.

[21]  Kenneth I. Laws,et al.  Rapid Texture Identification , 1980, Optics & Photonics.

[22]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[23]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[24]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[25]  Brian C. Lovell,et al.  Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach , 2012, ECCV.

[26]  C. R. Rao,et al.  Fisher-Rao metric , 2009, Scholarpedia.

[27]  Vassilios Morellas,et al.  Tensor Sparse Coding for Region Covariances , 2010, ECCV.

[28]  Rudrasis Chakraborty,et al.  An efficient recursive estimator of the Fréchet mean on a hypersphere with applications to Medical Image Analysis , 2015 .

[29]  Zhizhou Wang,et al.  DTI segmentation using an information theoretic tensor dissimilarity measure , 2005, IEEE Transactions on Medical Imaging.

[30]  Xilin Chen,et al.  Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Hasan Ertan Ceting Intrinsic Mean Shift for Clustering on Stiefel and Grassmann Manifolds , 2009 .

[33]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[34]  Larry S. Davis,et al.  Learning Discriminative Appearance-Based Models Using Partial Least Squares , 2009, 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing.

[35]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[36]  Jana Reinhard,et al.  Textures A Photographic Album For Artists And Designers , 2016 .

[37]  Rudrasis Chakraborty,et al.  Recursive Fréchet Mean Computation on the Grassmannian and Its Applications to Computer Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[39]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  B. Vemuri,et al.  Fusing probability distributions with information theoretic centers and its application to data retrieval , 2005 .

[41]  Sébastien Bubeck,et al.  Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[42]  Janusz Konrad,et al.  Action Recognition Using Sparse Representation on Covariance Manifolds of Optical Flow , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.