Kernel Principal Geodesic Analysis

Kernel principal component analysis (kPCA) has been proposed as a dimensionality-reduction technique that achieves nonlinear, low-dimensional representations of data via the mapping to kernel feature space. Conventionally, kPCA relies on Euclidean statistics in kernel feature space. However, Euclidean analysis can make kPCA inefficient or incorrect for many popular kernels that map input points to a hypersphere in kernel feature space. To address this problem, this paper proposes a novel adaptation of kPCA, namely kernel principal geodesic analysis (kPGA), for hyperspherical statistical analysis in kernel feature space. This paper proposes tools for statistical analyses on the Riemannian manifold of the Hilbert sphere in the reproducing kernel Hilbert space, including algorithms for computing the sample weighted Karcher mean and eigen analysis of the sample weighted Karcher covariance. It then applies these tools to propose novel methods for (i)adimensionality reduction and (ii)aclustering using mixture-model fitting. The results, on simulated and real-world data, show that kPGA-based methods perform favorably relative to their kPCA-based analogs.

[1]  Matti Pietikäinen,et al.  Supervised Locally Linear Embedding , 2003, ICANN.

[2]  Michael Felsberg,et al.  Continuous dimensionality characterization of image structures , 2009, Image Vis. Comput..

[3]  Bernhard Schölkopf,et al.  Diffeomorphic Dimensionality Reduction , 2008, NIPS.

[4]  Mark Pauly,et al.  Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling , 2006 .

[5]  Magnus Rattray,et al.  Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA , 2003, NIPS.

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  Gilles Blanchard,et al.  Statistical properties of kernel principal component analysis , 2007, Machine Learning.

[8]  K. Hüper,et al.  On the Computation of the Karcher Mean on Spheres and Special Orthogonal Groups , 2007 .

[9]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[10]  M. Berger A Panoramic View of Riemannian Geometry , 2003 .

[11]  Svetlana Lazebnik,et al.  Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization , 2005, NIPS.

[12]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[13]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[14]  Julien Ah-Pine Normalized Kernels as Similarity Indices , 2010, PAKDD.

[15]  Samuel R. Buss,et al.  Spherical averages and applications to spherical splines and interpolation , 2001, TOGS.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Xavier Pennec,et al.  Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements , 2006, Journal of Mathematical Imaging and Vision.

[18]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[19]  S. Berman Isotropic Gaussian Processes on the Hilbert Sphere , 1980 .

[20]  Thomas Burger,et al.  Geodesic Analysis on the Gaussian RKHS Hypersphere , 2012, ECML/PKDD.

[21]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[22]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[23]  W. Boothby An introduction to differentiable manifolds and Riemannian geometry , 1975 .

[24]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[25]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[26]  B. Charlier Necessary and sufficient condition for the existence of a Fréchet mean on the circle , 2013 .

[27]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[28]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[29]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[30]  Alfred O. Hero,et al.  On Local Intrinsic Dimension Estimation and Its Applications , 2010, IEEE Transactions on Signal Processing.

[31]  Changshui Zhang,et al.  Kernel Trick Embedded Gaussian Mixture Model , 2003, ALT.

[32]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[33]  Alexander J. Smola,et al.  Classification in a normalized feature space using support vector machines , 2003, IEEE Trans. Neural Networks.

[34]  Frank Nielsen,et al.  Matrix Information Geometry , 2012 .

[35]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[36]  B. Afsari Riemannian Lp center of mass: existence, uniqueness, and convexity , 2011 .

[37]  R. Bhattacharya,et al.  LARGE SAMPLE THEORY OF INTRINSIC AND EXTRINSIC SAMPLE MEANS ON MANIFOLDS—II , 2003 .

[38]  W. Kendall Probability, Convexity, and Harmonic Maps with Small Image I: Uniqueness and Fine Existence , 1990 .

[39]  René Vidal,et al.  On the Convergence of Gradient Descent for Finding the Riemannian Center of Mass , 2011, SIAM J. Control. Optim..

[40]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[41]  Matthias Hein,et al.  Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[42]  Anoop Cherian,et al.  Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  S. Kakutani 56. Topological Properties of the Unit Sphere of a Hilbert Space , 1943 .

[44]  W. J. Whiten,et al.  Fitting Mixtures of Kent Distributions to Aid in Joint Set Identification , 2001 .

[45]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[46]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[47]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[49]  André Mas Weak convergence in the functional autoregressive model , 2005, math/0509256.

[50]  J. Marron,et al.  The high-dimension, low-sample-size geometric representation holds under mild conditions , 2007 .

[51]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[52]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[53]  Søren Hauberg,et al.  Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations , 2010, ECCV.