论文信息 - Tight query complexity lower bounds for PCA via finite sample deformed wigner law

Tight query complexity lower bounds for PCA via finite sample deformed wigner law

We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.

[1] A. Bandeira,et al. Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[2] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[3] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4] Robert D. Nowak,et al. Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[5] C. Donati-Martin,et al. The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[6] C. Tracy,et al. Introduction to Random Matrices , 1992, hep-th/9210073.

[7] Sham M. Kakade,et al. Faster Eigenvector Computation via Shift-and-Invert Preconditioning , 2016, ICML.

[8] Rahul Jain,et al. Lifting randomized query complexity to randomized communication complexity , 2017, Electron. Colloquium Comput. Complex..

[9] Emmanuel J. Candès,et al. On the Fundamental Limits of Adaptive Sensing , 2011, IEEE Transactions on Information Theory.

[10] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .

[11] R. Castro. Adaptive sensing performance lower bounds for sparse signal detection and support estimation , 2012, 1206.0648.

[12] Rui M. Castro,et al. Adaptive Compressed Sensing for Support Recovery of Structured Sparse Sets , 2014, IEEE Transactions on Information Theory.

[13] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[14] Daniel A. Spielman,et al. Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[15] Ohad Shamir,et al. Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[16] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[17] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[18] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[19] David P. Woodruff,et al. Lower Bounds for Adaptive Sparse Recovery , 2012, SODA.

[20] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[21] O. Kallenberg. Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[22] Friedrich Liese. $\phi $PHI-divergences, sufficiency, Bayes sufficiency, and deficiency , 2012 .

[23] Aditya Guntuboyina. Lower Bounds for the Minimax Risk Using $f$-Divergences, and Applications , 2011, IEEE Transactions on Information Theory.

[24] Cameron Musco,et al. Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition , 2015, NIPS.

[25] Ohad Shamir,et al. Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.

[26] Yuanzhi Li,et al. Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.