Robust Dictionary Learning on the Hilbert Sphere in Kernel Feature Space

This paper presents a novel dictionary learning method in kernel feature space that is part of a reproducing kernel Hilbert space RKHS. Our method focuses on several popular kernels, e.g., radial basis function kernels like the Gaussian, that implicitly map data to a Hilbert sphere, a Riemannian manifold, in RKHS. Our method exploits this manifold structure of the mapped data in RKHS, unlike typical methods for kernel dictionary learning that use linear methods in RKHS. We show that dictionary learning on a Hilbert sphere in RKHS is possible without the need of the explicit lifting map underlying the kernel, but using solely the Gram matrix. Unlike the typical $$L^1$$ norm sparsity prior, we incorporate the non-convex $$L^p$$ quasi-norm based penalty, with $$p < 1$$, on coefficients to enforce a stronger sparsity prior and achieve more robust dictionary learning in the presence of corrupted training data. We evaluate our method for image classification on two large publicly available datasets and demonstrate the improved performance of our method over the state of the art dictionary learning methods.

[1]  Mehrtash Tafazzoli Harandi,et al.  Riemannian coding and dictionary learning: Kernels to the rescue , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jingdong Wang,et al.  Online Robust Non-negative Dictionary Learning for Visual Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Zhijun Wu,et al.  The Eeective Energy Transformation Scheme as a General Continuation Approach to Global Optimization with Application to Molecular Conformation , 2022 .

[5]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  Alexander J. Smola,et al.  Classification in a normalized feature space using support vector machines , 2003, IEEE Trans. Neural Networks.

[8]  Baba C. Vemuri,et al.  On A Nonlinear Generalization of Sparse Coding and Dictionary Learning , 2013, ICML.

[9]  M. Berger A Panoramic View of Riemannian Geometry , 2003 .

[10]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Brian C. Lovell,et al.  Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Richard G. Baraniuk,et al.  Tag-Aware Ordinal Sparse Factor Analysis for Learning and Content Analytics , 2014, EDM.

[13]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[14]  Suvrit Sra,et al.  A new metric on the manifold of kernel matrices with application to matrix geometric means , 2012, NIPS.

[15]  Mark Pauly,et al.  Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling , 2006 .

[16]  Aleix M. Martínez,et al.  Rotation Invariant Kernels and Their Application to Shape Analysis , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Suyash P. Awate,et al.  Adaptive Sparsity in Gaussian Graphical Models , 2013, ICML.

[19]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[20]  Shun-ichi Amari,et al.  $\ell _{p}$ -Regularized Least Squares $(0 and Critical Path , 2016 .

[21]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[22]  Thomas Burger,et al.  Geodesic Analysis on the Gaussian RKHS Hypersphere , 2012, ECML/PKDD.

[23]  Rama Chellappa,et al.  Design of Non-Linear Kernel Dictionaries for Object Recognition , 2013, IEEE Transactions on Image Processing.

[24]  Suyash P. Awate,et al.  Kernel Principal Geodesic Analysis , 2014, ECML/PKDD.

[25]  Hongdong Li,et al.  A Framework for Shape Analysis via Hilbert Space Embedding , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[27]  Chris H. Q. Ding,et al.  Robust Non-Negative Dictionary Learning , 2014, AAAI.

[28]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  G. Golub,et al.  Tracking a few extreme singular values and vectors in signal processing , 1990, Proc. IEEE.

[30]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[31]  Wei Liu,et al.  Dictionary Pair Learning on Grassmann Manifolds for Image Denoising , 2015, IEEE Transactions on Image Processing.

[32]  René Vidal,et al.  Clustering and dimensionality reduction on Riemannian manifolds , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  P. Thomas Fletcher,et al.  Riemannian geometry for the statistical analysis of diffusion tensor data , 2007, Signal Process..

[34]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[35]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[36]  Tong Zhang,et al.  Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[37]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .

[38]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[39]  Julien Ah-Pine Normalized Kernels as Similarity Indices , 2010, PAKDD.

[40]  Jeffrey S. Johnson,et al.  The recognition of partially visible natural objects in the presence and absence of their occluders , 2005, Vision Research.

[41]  Glenn Fung,et al.  Equivalence of Minimal ℓ0- and ℓp-Norm Solutions of Linear Equalities, Inequalities and Linear Programs for Sufficiently Small p , 2011, J. Optim. Theory Appl..

[42]  Ying Wu,et al.  Robust Dictionary Learning by Error Source Decomposition , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Cewu Lu,et al.  Online Robust Dictionary Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[45]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[46]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[47]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[48]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[49]  Hossein Mobahi,et al.  Toward a Practical Face Recognition System: Robust Alignment and Illumination by Sparse Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[51]  E. Allgower,et al.  Introduction to Numerical Continuation Methods , 1987 .

[52]  W. J. Whiten,et al.  Fitting Mixtures of Kent Distributions to Aid in Joint Set Identification , 2001 .

[53]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[54]  A. L. Onishchik,et al.  Projective and Cayley-Klein Geometries , 2006 .

[55]  Feiping Nie,et al.  Robust Dictionary Learning with Capped l1-Norm , 2015, IJCAI.

[56]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[57]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[58]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Anoop Cherian,et al.  Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval , 2011, ECML/PKDD.

[60]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[61]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[62]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[63]  Anoop Cherian,et al.  Riemannian Sparse Coding for Positive Definite Matrices , 2014, ECCV.

[64]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[65]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.