Tight query complexity lower bounds for PCA via finite sample deformed wigner law

We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.

[1]  A. Bandeira,et al.  Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[2]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[5]  C. Donati-Martin,et al.  The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[6]  C. Tracy,et al.  Introduction to Random Matrices , 1992, hep-th/9210073.

[7]  Sham M. Kakade,et al.  Faster Eigenvector Computation via Shift-and-Invert Preconditioning , 2016, ICML.

[8]  Rahul Jain,et al.  Lifting randomized query complexity to randomized communication complexity , 2017, Electron. Colloquium Comput. Complex..

[9]  Emmanuel J. Candès,et al.  On the Fundamental Limits of Adaptive Sensing , 2011, IEEE Transactions on Information Theory.

[10]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[11]  R. Castro Adaptive sensing performance lower bounds for sparse signal detection and support estimation , 2012, 1206.0648.

[12]  Rui M. Castro,et al.  Adaptive Compressed Sensing for Support Recovery of Structured Sparse Sets , 2014, IEEE Transactions on Information Theory.

[13]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[14]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[15]  Ohad Shamir,et al.  Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[16]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[17]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[18]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[19]  David P. Woodruff,et al.  Lower Bounds for Adaptive Sparse Recovery , 2012, SODA.

[20]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[21]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[22]  Friedrich Liese $\phi $PHI-divergences, sufficiency, Bayes sufficiency, and deficiency , 2012 .

[23]  Aditya Guntuboyina Lower Bounds for the Minimax Risk Using $f$-Divergences, and Applications , 2011, IEEE Transactions on Information Theory.

[24]  Cameron Musco,et al.  Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition , 2015, NIPS.

[25]  Ohad Shamir,et al.  Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.

[26]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[27]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[28]  Gregory Valiant,et al.  Memory, Communication, and Statistical Queries , 2016, COLT.

[29]  John C. Duchi,et al.  Minimax rates for memory-bounded sparse linear regression , 2015, COLT.

[30]  Jakub W. Pachocki,et al.  Optimal lower bounds for universal relation, samplers, and finding duplicates , 2017, ArXiv.

[31]  D. Féral,et al.  The Largest Eigenvalue of Rank One Deformation of Large Wigner Matrices , 2006, math/0605624.

[32]  Friedrich Liese phi-divergences, sufficiency, Bayes sufficiency, and deficiency , 2012, Kybernetika.

[33]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[34]  I. Csiszár A class of measures of informativity of observation channels , 1972 .

[35]  Yuanzhi Li,et al.  First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[36]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[37]  Xi Chen,et al.  On Bayes Risk Lower Bounds , 2014, J. Mach. Learn. Res..

[38]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.