Finite sample approximation results for principal component analysis: a matrix perturbation approach

Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of $n$ observations (samples), each with $p$ variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size $n$, and those of the limiting population PCA as $n\to\infty$. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit $p,n\to\infty$, with $p/n=c$. We present a matrix perturbation view of the "phase transition phenomenon," and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit. Moreover, our analysis also applies for finite $p,n$ where we show that although there is no sharp phase transition as in the infinite case, either as a function of noise level or as a function of sample size $n$, the eigenvector of sample PCA may exhibit a sharp "loss of tracking," suddenly losing its relation to the (true) eigenvector of the population PCA matrix. This occurs due to a crossover between the eigenvalue due to the signal and the largest eigenvalue due to noise, whose eigenvector points in a random direction.

[1]  M. A. Girshick On the Sampling Theory of Roots of Determinantal Equations , 1939 .

[2]  D. Lawley TESTS OF SIGNIFICANCE FOR THE LATENT ROOTS OF COVARIANCE AND CORRELATION MATRICES , 1956 .

[3]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[4]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[5]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[6]  Tosio Kato Perturbation theory for linear operators , 1966 .

[7]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[8]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[9]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[10]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[11]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[12]  C. O'Connor An introduction to multivariate statistical analysis: 2nd edn. by T. W. Anderson. 675 pp. Wiley, New York (1984) , 1987 .

[13]  G. Stewart,et al.  Computing the eigenvalues and eigenvectors of symmetric arrowhead matrices , 1990 .

[14]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[15]  David E. Tyler,et al.  ON WIELANDT'S INEQUALITY AND ITS APPLICATION TO THE ASYMPTOTIC DISTRIBUTION OF THE EIGENVALUES OF A RANDOM SYMMETRIC MATRIX , 1991 .

[16]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[17]  Michael Biehl,et al.  Statistical mechanics of unsupervised structure recognition , 1994 .

[18]  J. Nadal,et al.  Optimal unsupervised learning , 1994 .

[19]  J. W. Silverstein,et al.  Analysis of the limiting spectral distribution of large dimensional random matrices , 1995 .

[20]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of a class of large dimensional random matrices , 1995 .

[21]  C. R. Rao Improved Linear Discrimination Using Time-frequency Dictionaries , 1995 .

[22]  Geert Jan Bex,et al.  A Gaussian scenario for unsupervised learning , 1996 .

[23]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[24]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[25]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[26]  Magnus Rattray,et al.  PCA learning for sparse high-dimensional data , 2003 .

[27]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[28]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[29]  D. B. Hibbert Multivariate calibration and classification - T. Naes, T. Isaksson, T. Fearn and T. Davis, NIR Publications, Chichester, 2002, ISBN 0 9528666 2 5, UK @$45.00, US$75.00 , 2004 .

[30]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[31]  D. Paul,et al.  Asymptotics of the leading sample eigenvalues for a spiked covariance model , 2004 .

[32]  Ronald R. Coifman,et al.  Partial least squares, Beer's law and the net analyte signal: statistical modeling and analysis , 2005 .

[33]  Ronald R. Coifman,et al.  The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration , 2005 .

[34]  G. Stewart Stochastic Perturbation Theory , 1990, SIAM Rev..

[35]  Noureddine El Karoui Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices , 2005, math/0503109.

[36]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[37]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of large dimensional information-plus-noise-type matrices , 2007 .

[38]  B. Nadler,et al.  Determining the number of components in a factor model from limited noisy data , 2008 .

[39]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[40]  Ilse C. F. Ipsen,et al.  Refined Perturbation Bounds for Eigenvalues of Hermitian and Non-Hermitian Matrices , 2009, SIAM J. Matrix Anal. Appl..

[41]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.