Dictionary Learning and Sparse Coding on Statistical Manifolds

In this paper, we propose a novel information theoretic framework for dictionary learning (DL) and sparse coding (SC) on a statistical manifold (the manifold of probability distributions). Unlike the traditional DL and SC framework, our new formulation does not explicitly incorporate any sparsity inducing norm in the cost function being optimized but yet yields sparse codes. Our algorithm approximates the data points on the statistical manifold (which are probability distributions) by the weighted Kullback-Leibeler center/mean (KL-center) of the dictionary atoms. The KL-center is defined as the minimizer of the maximum KL-divergence between itself and members of the set whose center is being sought. Further, we prove that the weighted KL-center is a sparse combination of the dictionary atoms. This result also holds for the case when the KL-divergence is replaced by the well known Hellinger distance. From an applications perspective, we present an extension of the aforementioned framework to the manifold of symmetric positive definite matrices (which can be identified with the manifold of zero mean gaussian distributions), $\mathcal{P}_n$. We present experiments involving a variety of dictionary-based reconstruction and classification problems in Computer Vision. Performance of the proposed algorithm is demonstrated by comparing it to several state-of-the-art methods in terms of reconstruction and classification accuracy as well as sparsity of the chosen representation.

[1]  Yigang Cen,et al.  Analytic separable dictionary learning based on oblique manifold , 2017, Neurocomputing.

[2]  Hasan Ertan Ceting Intrinsic Mean Shift for Clustering on Stiefel and Grassmann Manifolds , 2009 .

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Boris Ryabko Fast and efficient coding of information sources , 1994, IEEE Trans. Inf. Theory.

[5]  Maher Moakher,et al.  A Differential Geometric Approach to the Geometric Mean of Symmetric Positive-Definite Matrices , 2005, SIAM J. Matrix Anal. Appl..

[6]  Mashbat Suzuki,et al.  Information Geometry and Statistical Manifold , 2014 .

[7]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[8]  Rudrasis Chakraborty,et al.  Recursive Fréchet Mean Computation on the Grassmannian and Its Applications to Computer Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Baba C. Vemuri,et al.  On A Nonlinear Generalization of Sparse Coding and Dictionary Learning , 2013, ICML.

[10]  Anoop Cherian,et al.  Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval , 2011, ECML/PKDD.

[11]  Rachid Deriche,et al.  Statistics on the Manifold of Multivariate Normal Distributions: Theory and Application to Diffusion Tensor MRI Processing , 2006, Journal of Mathematical Imaging and Vision.

[12]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[13]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  R. Gallager Information Theory and Reliable Communication , 1968 .

[16]  Christophe Ley,et al.  Modern Directional Statistics , 2017 .

[17]  Kenneth I. Laws,et al.  Rapid Texture Identification , 1980, Optics & Photonics.

[18]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[19]  Michael Elad,et al.  K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[20]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Anoop Cherian,et al.  Riemannian Sparse Coding for Positive Definite Matrices , 2014, ECCV.

[22]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Rudrasis Chakraborty,et al.  An efficient recursive estimator of the Fréchet mean on a hypersphere with applications to Medical Image Analysis , 2015 .

[24]  Alberto Leon-Garcia,et al.  A source matching approach to finding minimax codes , 1980, IEEE Trans. Inf. Theory.

[25]  Jean-Luc Starck,et al.  Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit , 2012, IEEE Transactions on Information Theory.

[26]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[27]  P. Thomas Fletcher,et al.  Riemannian geometry for the statistical analysis of diffusion tensor data , 2007, Signal Process..

[28]  B. Vemuri,et al.  Fusing probability distributions with information theoretic centers and its application to data retrieval , 2005 .

[29]  Lei Zhang,et al.  Log-Euclidean Kernels for Sparse Representation and Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[31]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[32]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  Ajmal S. Mian,et al.  Discriminative Bayesian Dictionary Learning for Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[36]  Sébastien Bubeck,et al.  Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[37]  Rama Chellappa,et al.  Information-Theoretic Dictionary Learning for Image Classification , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[39]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[40]  Mehrtash Tafazzoli Harandi,et al.  Riemannian coding and dictionary learning: Kernels to the rescue , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[42]  Janusz Konrad,et al.  Action Recognition Using Sparse Representation on Covariance Manifolds of Optical Flow , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[43]  Zhizhou Wang,et al.  DTI segmentation using an information theoretic tensor dissimilarity measure , 2005, IEEE Transactions on Medical Imaging.

[44]  Xilin Chen,et al.  Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Calyampudi R. Rao,et al.  Chapter 3: Differential and Integral Geometry in Statistical Inference , 1987 .

[46]  Brian C. Lovell,et al.  Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach , 2012, ECCV.

[47]  Søren Hauberg,et al.  Geodesic exponential kernels: When curvature and linearity conflict , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  C. R. Rao,et al.  Fisher-Rao metric , 2009, Scholarpedia.

[50]  Vassilios Morellas,et al.  Tensor Sparse Coding for Region Covariances , 2010, ECCV.

[51]  Jana Reinhard,et al.  Textures A Photographic Album For Artists And Designers , 2016 .