Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics

Perturbation bounds for singular spaces, in particular Wedin's $\sin \Theta$ theorem, are a fundamental tool in many fields including high-dimensional statistics, machine learning, and applied mathematics. In this paper, we establish separate perturbation bounds, measured in both spectral and Frobenius $\sin \Theta$ distances, for the left and right singular subspaces. Lower bounds, which show that the individual perturbation bounds are rate-optimal, are also given. The new perturbation bounds are applicable to a wide range of problems. In this paper, we consider in detail applications to low-rank matrix denoising and singular space estimation, high-dimensional clustering, and canonical correlation analysis (CCA). In particular, separate matching upper and lower bounds are obtained for estimating the left and right singular spaces. To the best of our knowledge, this is the first result that gives different optimal rates for the left and right singular spaces under the same perturbation. In addition to these problems, applications to other high-dimensional problems such as community detection in bipartite networks, multidimensional scaling, and cross-covariance matrix estimation are also discussed.

[1]  Jianqing Fan,et al.  An l∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation , 2018, Journal of machine learning research : JMLR.

[2]  Jianqing Fan,et al.  An $\ell_{\infty}$ Eigenvector Perturbation Bound and Its Application , 2016, J. Mach. Learn. Res..

[3]  Dan Yang,et al.  Rate Optimal Denoising of Simultaneously Sparse and Low Rank Matrices , 2014, J. Mach. Learn. Res..

[4]  K. Horadam,et al.  Community Detection in Bipartite Networks: Algorithms and Case studies , 2016 .

[5]  Donggyu Kim,et al.  Asymptotic Theory for Estimating the Singular Vectors and Values of a Partially-observed Low Rank Matrix with Noise , 2015, 1508.05431.

[6]  Xiaodong Li,et al.  Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.

[7]  Jiashun Jin,et al.  Phase Transitions for High Dimensional Clustering and Related Problems , 2015, 1502.06952.

[8]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[9]  Harrison H. Zhou,et al.  Minimax estimation in sparse canonical correlation analysis , 2014, 1405.1595.

[10]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[11]  T. Cai,et al.  Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[12]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[13]  Rongrong Wang,et al.  Singular Vector Perturbation Under Gaussian Noise , 2012, SIAM J. Matrix Anal. Appl..

[14]  Christos Boutsidis,et al.  Randomized Dimensionality Reduction for $k$ -Means Clustering , 2011, IEEE Transactions on Information Theory.

[15]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[16]  Jiashun Jin,et al.  Influential Feature PCA for high dimensional clustering , 2014, 1407.5241.

[17]  Wanjie Wang,et al.  Important Feature PCA for high dimensional clustering , 2014 .

[18]  David Melamed,et al.  Community Structures in Bipartite Networks: A Dual-Projection Approach , 2014, PloS one.

[19]  Tengyao Wang,et al.  A useful variant of the Davis--Kahan theorem for statisticians , 2014, 1405.0680.

[20]  David L. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4/sqrt(3) , 2013, 1305.5870.

[21]  D. Donoho,et al.  Minimax risk of matrix denoising by singular value thresholding , 2013, 1304.2085.

[22]  Qiuping Xu Canonical correlation Analysis , 2014 .

[23]  Harrison H. Zhou,et al.  Sparse CCA via Precision Adjusted Iterative Thresholding , 2013, 1311.6186.

[24]  V. Vu,et al.  Random perturbation of low rank matrices: Improving classical bounds , 2013, 1311.2657.

[25]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[26]  Larry A. Wasserman,et al.  Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation , 2013, NIPS.

[27]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[28]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[29]  Emmanuel J. Candès,et al.  Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators , 2012, IEEE Transactions on Signal Processing.

[30]  Andrew B. Nobel,et al.  Reconstruction of a low-rank matrix in the presence of Gaussian noise , 2010, J. Multivar. Anal..

[31]  D. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4 / √ 3 , 2013 .

[32]  T. Tao Topics in Random Matrix Theory , 2012 .

[33]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[34]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[35]  Sivaraman Balakrishnan,et al.  Noise Thresholds for Spectral Clustering , 2011, NIPS.

[36]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[37]  Van H. Vu Singular vectors under random perturbation , 2011, Random Struct. Algorithms.

[38]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[39]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[40]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[41]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[42]  Amit Singer,et al.  Uniqueness of Low-Rank Matrix Completion by Rigidity Theory , 2009, SIAM J. Matrix Anal. Appl..

[43]  Lieven Vandenberghe,et al.  Interior-Point Method for Nuclear Norm Approximation with Application to System Identification , 2009, SIAM J. Matrix Anal. Appl..

[44]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[45]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[46]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[47]  C. Donati-Martin,et al.  The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[48]  R. Vershynin Spectral norm of products of random and deterministic matrices , 2008, 0812.2432.

[49]  Ruth M. Pfeiffer,et al.  On the distribution of the left singular vectors of a random matrix and its applications , 2008 .

[50]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[51]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[52]  Michael Stewart,et al.  Perturbation of the SVD in the presence of small singular values , 2006 .

[53]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[54]  C. Parvin An Introduction to Multivariate Statistical Analysis, 3rd ed. T.W. Anderson. Hoboken, NJ: John Wiley & Sons, 2003, 742 pp., $99.95, hardcover. ISBN 0-471-36091-0. , 2004 .

[55]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[56]  F. M. Dopico A Note on Sin Θ Theorems for Singular Subspace Variations , 2000 .

[57]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[58]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[59]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[60]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[61]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[62]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[63]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[64]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[65]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[66]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .