An $\ell_p$ theory of PCA and spectral clustering.

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $\ell_p$ norm, which includes $\ell_2$ and $\ell_\infty$ as special examples. For sub-Gaussian mixture models, the choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $\ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.

[1]  Vladimir Koltchinskii,et al.  Asymptotics and Concentration Bounds for Bilinear Forms of Spectral Projectors of Sample Covariance , 2014, 1408.4643.

[2]  P. Laplace Memoir on the Probability of the Causes of Events , 1986 .

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Jianqing Fan,et al.  An l∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation , 2018, Journal of machine learning research : JMLR.

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[7]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[8]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[9]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[10]  Kaizheng Wang Some compact notations for concentration inequalities and user-friendly results , 2019, ArXiv.

[11]  Horng-Tzer Yau,et al.  Semicircle law on short scales and delocalization of eigenvectors for Wigner random matrices , 2007, 0711.1730.

[12]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[13]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[14]  Purnamrita Sarkar,et al.  A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers , 2019, Oper. Res..

[15]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[16]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[17]  Ravishankar Krishnaswamy,et al.  Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.

[18]  B. A. Schmitt Perturbation bounds for matrix square roots and pythagorean sums , 1992 .

[19]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[20]  W. Fulks A generalization of Laplace’s method , 1951 .

[21]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[22]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[23]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[24]  Zhuang Ma,et al.  Exploration of Large Networks with Covariates via Fast and Universal Latent Space Model Fitting , 2017 .

[25]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[26]  Yuxin Chen,et al.  Spectral Method and Regularized MLE Are Both Optimal for Top-$K$ Ranking , 2017, Annals of statistics.

[27]  Yu Lu,et al.  Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants , 2016, ArXiv.

[28]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[29]  Anil Damle,et al.  Uniform Bounds for Invariant Subspace Perturbations , 2020, SIAM J. Matrix Anal. Appl..

[30]  Jiashun Jin,et al.  Influential Feature PCA for high dimensional clustering , 2014, 1407.5241.

[31]  V. Koltchinskii,et al.  Concentration inequalities and moment bounds for sample covariance operators , 2014, 1405.2468.

[32]  C. Priebe,et al.  The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics , 2017, The Annals of Statistics.

[33]  Purnamrita Sarkar,et al.  Covariate Regularized Community Detection in Sparse Graphs , 2016, Journal of the American Statistical Association.

[34]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[35]  Nicolas Boumal,et al.  Near-Optimal Bounds for Phase Synchronization , 2017, SIAM J. Optim..

[36]  Noureddine El Karoui,et al.  On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[37]  Yun Yang,et al.  Hanson–Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data , 2018, 1810.11180.

[38]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[39]  J. Gärtner On Large Deviations from the Invariant Measure , 1977 .

[40]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[41]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[42]  V. Koltchinskii,et al.  Random matrix approximation of spectra of integral operators , 2000 .

[43]  Larry A. Wasserman,et al.  Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation , 2013, NIPS.

[44]  Andrea Montanari,et al.  Spectral Algorithms for Tensor Completion , 2016, ArXiv.

[45]  Yudong Chen,et al.  Hidden Integrality of SDP Relaxation for Sub-Gaussian Mixture Models , 2018, COLT.

[46]  Carey E. Priebe,et al.  Spectral inference for large Stochastic Blockmodels with nodal covariates , 2019 .

[47]  Ankur Moitra,et al.  Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization , 2016, ArXiv.

[48]  Zahra S. Razaee,et al.  Concentration of kernel matrices with application to kernel spectral clustering , 2019, ArXiv.

[49]  Andrea Montanari,et al.  Contextual Stochastic Block Models , 2018, NeurIPS.

[50]  Anru R. Zhang,et al.  Heteroskedastic PCA: Algorithm, optimality, and applications , 2018, The Annals of Statistics.

[51]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[52]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[53]  Yun Yang,et al.  Cutoff for Exact Recovery of Gaussian Mixture Models , 2021, IEEE Transactions on Information Theory.

[54]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[55]  Anru R. Zhang,et al.  Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics , 2016, 1605.00353.

[56]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[57]  R. Ellis,et al.  LARGE DEVIATIONS FOR A GENERAL-CLASS OF RANDOM VECTORS , 1984 .

[58]  H. Vincent Poor,et al.  Subspace Estimation from Unbalanced and Incomplete Data Matrices: 𝓁2, ∞ Statistical Guarantees , 2021, ArXiv.

[59]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[60]  Haolei Weng,et al.  Community detection with nodal information: Likelihood and its variational approximation , 2016, Stat.

[61]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[62]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[63]  Weichen Wang,et al.  An $\ell_{\infty}$ Eigenvector Perturbation Bound and Its Application , 2017, J. Mach. Learn. Res..

[64]  Jianqing Fan,et al.  Asymptotics of empirical eigenstructure for high dimensional spiked covariance. , 2017, Annals of statistics.

[65]  H. Vincent Poor,et al.  Subspace Estimation from Unbalanced and Incomplete Data Matrices: $\ell_{2,\infty}$ Statistical Guarantees. , 2019 .

[66]  Uriel Feige,et al.  Spectral techniques applied to sparse random graphs , 2005, Random Struct. Algorithms.

[67]  Mikhail Belkin,et al.  Unperturbed: spectral analysis beyond Davis-Kahan , 2017, ALT.

[68]  V. Vu,et al.  Random perturbation of low rank matrices: Improving classical bounds , 2013, 1311.2657.

[69]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[70]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[71]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[72]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[73]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[74]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[75]  Christophe Giraud,et al.  Partial recovery bounds for clustering with the relaxed K-means , 2018, Mathematical Statistics and Learning.

[76]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[77]  Jianqing Fan,et al.  ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK. , 2017, Annals of statistics.

[78]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[79]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[80]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[81]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[82]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[83]  Anderson Y. Zhang,et al.  Optimality of Spectral Clustering for Gaussian Mixture Model , 2019, ArXiv.

[84]  Dong Xia,et al.  Perturbation of linear forms of singular vectors under Gaussian noise , 2015 .

[85]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[86]  Purnamrita Sarkar,et al.  Estimating Mixed Memberships With Sharp Eigenvector Deviations , 2017, Journal of the American Statistical Association.

[87]  M. Ndaoud Sharp optimal recovery in the Two Component Gaussian Mixture Model , 2018, 1812.08078.

[88]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[90]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[91]  Martin Royer,et al.  Adaptive Clustering through Semidefinite Programming , 2017, NIPS.

[92]  Dustin G. Mixon,et al.  Clustering subgaussian mixtures by semidefinite programming , 2016, ArXiv.