AN (cid:96) p THEORY OF PCA AND SPECTRAL CLUSTERING

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an (cid:96) p perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel (cid:96) p analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in (cid:96) p norm, which includes (cid:96) 2 and (cid:96) ∞ as special cases. For sub-Gaussian mixture models, the choice of p giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the (cid:96) p theory leads to simple spectral algorithms that achieve the information threshold for exact recovery and the optimal misclassification rate.

[1]  Purnamrita Sarkar,et al.  A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers , 2019, Oper. Res..

[2]  A. Tsybakov,et al.  Improved Clustering Algorithms for the Bipartite Stochastic Block Model , 2019, IEEE Transactions on Information Theory.

[3]  Chao Gao,et al.  Iterative algorithm for discrete structure recovery , 2019, The Annals of Statistics.

[4]  M. Ndaoud Sharp optimal recovery in the Two Component Gaussian Mixture Model , 2018, 1812.08078.

[5]  Anru R. Zhang,et al.  Heteroskedastic PCA: Algorithm, optimality, and applications , 2018, The Annals of Statistics.

[6]  Yun Yang,et al.  Cutoff for Exact Recovery of Gaussian Mixture Models , 2020, IEEE Transactions on Information Theory.

[7]  Harrison H. Zhou,et al.  Optimality of Spectral Clustering for Gaussian Mixture Model , 2019, The Annals of Statistics.

[8]  Zahra S. Razaee,et al.  Concentration of kernel matrices with application to kernel spectral clustering , 2019, The Annals of Statistics.

[9]  Yuekai Sun,et al.  Uniform Bounds for Invariant Subspace Perturbations , 2019, SIAM J. Matrix Anal. Appl..

[10]  Jianqing Fan,et al.  ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK. , 2017, Annals of statistics.

[11]  Purnamrita Sarkar,et al.  Estimating Mixed Memberships With Sharp Eigenvector Deviations , 2017, Journal of the American Statistical Association.

[12]  Purnamrita Sarkar,et al.  Covariate Regularized Community Detection in Sparse Graphs , 2016, Journal of the American Statistical Association.

[13]  H. Vincent Poor,et al.  Subspace Estimation from Unbalanced and Incomplete Data Matrices: 𝓁2, ∞ Statistical Guarantees , 2021, ArXiv.

[14]  Some compact notations for concentration inequalities and user-friendly results , 2019, ArXiv.

[15]  Yuxin Chen,et al.  Subspace Estimation from Unbalanced and Incomplete Data Matrices: $\ell_{2,\infty}$ Statistical Guarantees. , 2019, 1910.04267.

[16]  Carey E. Priebe,et al.  Spectral inference for large Stochastic Blockmodels with nodal covariates , 2019 .

[17]  Christophe Giraud,et al.  Partial recovery bounds for clustering with the relaxed K-means , 2018, Mathematical Statistics and Learning.

[18]  Yuxin Chen,et al.  Spectral Method and Regularized MLE Are Both Optimal for Top-$K$ Ranking , 2017, Annals of statistics.

[19]  C. Priebe,et al.  The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics , 2017, The Annals of Statistics.

[20]  Yun Yang,et al.  Hanson–Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data , 2018, 1810.11180.

[21]  Andrea Montanari,et al.  Contextual Stochastic Block Models , 2018, NeurIPS.

[22]  Jianqing Fan,et al.  An l∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation , 2018, Journal of machine learning research : JMLR.

[23]  Yudong Chen,et al.  Hidden Integrality of SDP Relaxation for Sub-Gaussian Mixture Models , 2018, COLT.

[24]  Noureddine El Karoui,et al.  On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[25]  Mikhail Belkin,et al.  Unperturbed: spectral analysis beyond Davis-Kahan , 2017, ALT.

[26]  Nicolas Boumal,et al.  Near-Optimal Bounds for Phase Synchronization , 2017, SIAM J. Optim..

[27]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[28]  Noureddine El Karoui On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[29]  Jianqing Fan,et al.  Asymptotics of empirical eigenstructure for high dimensional spiked covariance. , 2017, Annals of statistics.

[30]  Martin Royer,et al.  Adaptive Clustering through Semidefinite Programming , 2017, NIPS.

[31]  Zhuang Ma,et al.  Exploration of Large Networks with Covariates via Fast and Universal Latent Space Model Fitting , 2017 .

[32]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[33]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[34]  Andrea Montanari,et al.  Spectral Algorithms for Tensor Completion , 2016, ArXiv.

[35]  Yu Lu,et al.  Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants , 2016, ArXiv.

[36]  Haolei Weng,et al.  Community detection with nodal information: Likelihood and its variational approximation , 2016, Stat.

[37]  Ankur Moitra,et al.  Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization , 2016, ArXiv.

[38]  Anru R. Zhang,et al.  Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics , 2016, 1605.00353.

[39]  Dustin G. Mixon,et al.  Clustering subgaussian mixtures by semidefinite programming , 2016, ArXiv.

[40]  Yuan Zhang,et al.  Community Detection in Networks with Node Features , 2015, Electronic Journal of Statistics.

[41]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[42]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[43]  Dong Xia,et al.  Perturbation of linear forms of singular vectors under Gaussian noise , 2015 .

[44]  Ravishankar Krishnaswamy,et al.  Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.

[45]  Vladimir Koltchinskii,et al.  Asymptotics and Concentration Bounds for Bilinear Forms of Spectral Projectors of Sample Covariance , 2014, 1408.4643.

[46]  Jiashun Jin,et al.  Influential Feature PCA for high dimensional clustering , 2014, 1407.5241.

[47]  V. Koltchinskii,et al.  Concentration inequalities and moment bounds for sample covariance operators , 2014, 1405.2468.

[48]  V. Vu,et al.  Random perturbation of low rank matrices: Improving classical bounds , 2013, 1311.2657.

[49]  Larry A. Wasserman,et al.  Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation , 2013, NIPS.

[50]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[51]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[52]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[53]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[54]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[55]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[56]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[57]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[58]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[59]  Horng-Tzer Yau,et al.  Semicircle law on short scales and delocalization of eigenvectors for Wigner random matrices , 2007, 0711.1730.

[60]  Statistical Properties of Kernel PrincipalComponent , 2009 .

[61]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[62]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[63]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[64]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[65]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[66]  Uriel Feige,et al.  Spectral techniques applied to sparse random graphs , 2005, Random Struct. Algorithms.

[67]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[68]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[69]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[70]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[71]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[72]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[73]  V. Koltchinskii,et al.  Random matrix approximation of spectra of integral operators , 2000 .

[74]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[75]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[77]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[78]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[79]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[80]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[81]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[82]  G. Wahba A Least Squares Estimate of Satellite Attitude , 1965 .

[83]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[84]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[85]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[86]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[87]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[88]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .