How Close Are the Eigenvectors of the Sample and Actual Covariance Matrices?

How many samples are sufficient to guarantee that the eigenvectors of the sample covariance matrix are close to those of the actual covariance matrix? For a wide family of distributions, including distributions with finite second moment and sub-gaussian distributions supported in a centered Euclidean ball, we prove that the inner product between eigenvectors of the sample and actual covariance matrices decreases proportionally to the respective eigenvalue distance and the number of samples. Our findings imply non-asymptotic concentration bounds for eigenvectors and eigenvalues and carry strong consequences for the non-asymptotic analysis of PCA and its applications. For instance, they provide conditions for separating components estimated from O(1) samples and show that even few samples can be sufficient to perform dimensionality reduction, especially for low-rank covariances.

[1]  James R. Schott Asymptotics of eigenprojections of correlation matrices with some applications in principal components analysis , 1997 .

[2]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[3]  J. W. Silverstein,et al.  No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices , 1998 .

[4]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[5]  Terry Caelli,et al.  Computation of Surface Geometry and Segmentation Using Covariance Techniques , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[7]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[8]  Z. Bai METHODOLOGIES IN SPECTRAL ANALYSIS OF LARGE DIMENSIONAL RANDOM MATRICES , A REVIEW , 1999 .

[9]  Aaron Sidford,et al.  Principal Component Projection Without Principal Component Analysis , 2016, ICML.

[10]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[11]  V. Girko Strong Law for the eigenvalues and eigenvectors of empirical covariance matrices , 1996 .

[12]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of a class of large dimensional random matrices , 1995 .

[13]  Vladimir Koltchinskii,et al.  Asymptotics and Concentration Bounds for Bilinear Forms of Spectral Projectors of Sample Covariance , 2014, 1408.4643.

[14]  Xavier Mestre,et al.  Improved Estimation of Eigenvalues and Eigenvectors of Covariance Matrices Using Their Sample Estimates , 2008, IEEE Transactions on Information Theory.

[15]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[16]  Vladimir Koltchinskii,et al.  Normal approximation and concentration of spectral projectors of sample covariance , 2015, 1504.07333.

[17]  Thomas Strohmer,et al.  Performance Analysis of Spectral Clustering on Compressed, Incomplete and Inaccurate Measurements , 2010, ArXiv.

[18]  R. Adamczak,et al.  Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles , 2009, 0903.2323.

[19]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[20]  G. Pan,et al.  On asymptotics of eigenvectors of large sample covariance matrix , 2007, 0708.1720.

[21]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[22]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[23]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[24]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[25]  Sergiy A. Vorobyov,et al.  Subspace leakage analysis of sample data covariance matrix , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  S. E. Ahmed,et al.  Large-sample estimation strategies for eigenvalues of a Wishart matrix , 1998 .

[27]  Ling Huang,et al.  Spectral Clustering with Perturbed Data , 2008, NIPS.

[28]  C. T. Fike,et al.  Norms and exclusion theorems , 1960 .

[29]  Z. Bai,et al.  Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .