Tangent space estimation for smooth embeddings of Riemannian manifolds

Numerous dimensionality reduction problems in data analysis involve the recovery of low-dimensional models or the learning of manifolds underlying sets of data. Many manifold learning methods require the estimation of the tangent space of the manifold at a point from locally available data samples. Local sampling conditions such as (i) the size of the neighborhood (sampling width) and (ii) the number of samples in the neighborhood (sampling density) affect the performance of learning algorithms. In this work, we propose a theoretical analysis of local sampling conditions for the estimation of the tangent space at a point P lying on a m-dimensional Riemannian manifold S in R^n. Assuming a smooth embedding of S in R^n, we estimate the tangent space T_P S by performing a Principal Component Analysis (PCA) on points sampled from the neighborhood of P on S. Our analysis explicitly takes into account the second order properties of the manifold at P, namely the principal curvatures as well as the higher order terms. We consider a random sampling framework and leverage recent results from random matrix theory to derive conditions on the sampling width and the local sampling density for an accurate estimation of tangent subspaces. We measure the estimation accuracy by the angle between the estimated tangent space and the true tangent space T_P S and we give conditions for this angle to be bounded with high probability. In particular, we observe that the local sampling conditions are highly dependent on the correlation between the components in the second-order local approximation of the manifold. We finally provide numerical simulations to validate our theoretical findings.

[1]  Stephen Smale,et al.  Finding the Homology of Submanifolds with High Confidence from Random Samples , 2008, Discret. Comput. Geom..

[2]  Wenhua Wang,et al.  Local and Global Regressive Mapping for Manifold Learning with Out-of-Sample Extrapolation , 2010, AAAI.

[3]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Pascal Frossard,et al.  Discretization of Parametrizable Signal Manifolds , 2011, IEEE Transactions on Image Processing.

[5]  Daniel N. Kaslovsky,et al.  Optimal Tangent Plane Recovery From Noisy Manifold Samples , 2011, ArXiv.

[6]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[7]  Alex Gittens,et al.  TAIL BOUNDS FOR ALL EIGENVALUES OF A SUM OF RANDOM MATRICES , 2011, 1104.4513.

[8]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[9]  D. Lawley TESTS OF SIGNIFICANCE FOR THE LATENT ROOTS OF COVARIANCE AND CORRELATION MATRICES , 1956 .

[10]  Hongbin Zha,et al.  Riemannian Manifold Learning for Nonlinear Dimensionality Reduction , 2006, ECCV.

[11]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[12]  Hongyuan Zha,et al.  Adaptive Manifold Learning , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  M. Maggioni,et al.  Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale SVD , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[14]  Hendra Gunawan,et al.  A formula for angles between subspaces of inner product spaces. , 2005 .

[15]  En Zhu,et al.  Incremental Manifold Learning Algorithm Using PCA on Overlapping Local Neighborhoods for Dimensionality Reduction , 2008, ISICA.

[16]  Van H. Vu Singular vectors under random perturbation , 2011, Random Struct. Algorithms.

[17]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[18]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Hongyuan Zha,et al.  Spectral Properties of the Alignment Matrices in Manifold Learning , 2009, SIAM Rev..

[20]  Local Sampling Analysis for Quadratic Embeddings of Riemannian Manifolds , 2011 .

[21]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[22]  Laurent Jacques,et al.  A Geometrical Study of Matching Pursuit Parametrization , 2008, IEEE Transactions on Signal Processing.

[23]  M. A. Girshick On the Sampling Theory of Roots of Determinantal Equations , 1939 .

[24]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[25]  Matthias Hein,et al.  Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space , 2005, ICML 2005.

[26]  A. Singer,et al.  Vector diffusion maps and the connection Laplacian , 2011, Communications on pure and applied mathematics.

[27]  Paul Geladi,et al.  Random error bias in principal component analysis. Part I. derivation of theoretical predictions , 1995 .

[28]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[29]  Daniel N. Kaslovsky,et al.  Non-Asymptotic Analysis of Tangent Space Perturbation , 2011 .

[30]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[31]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[32]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[33]  Pascal Frossard,et al.  Optimal Image Alignment With Random Projections of Manifolds: Algorithm and Geometric Analysis , 2011, IEEE Transactions on Image Processing.

[34]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[35]  Lorenzo Rosasco,et al.  Some Recent Advances in Multiscale Geometric Analysis of Point Clouds , 2011 .