论文信息 - Sum-of-Squares Lower Bounds for Sparse PCA

Sum-of-Squares Lower Bounds for Sparse PCA

This paper establishes a statistical versus computational trade-off for solving a basic high-dimensional machine learning problem via a basic convex relaxation method. Specifically, we consider the {\em Sparse Principal Component Analysis} (Sparse PCA) problem, and the family of {\em Sum-of-Squares} (SoS, aka Lasserre/Parillo) convex relaxations. It was well known that in large dimension $p$, a planted $k$-sparse unit vector can be {\em in principle} detected using only $n \approx k\log p$ (Gaussian or Bernoulli) samples, but all {\em efficient} (polynomial time) algorithms known require $n \approx k^2$ samples. It was also known that this quadratic gap cannot be improved by the the most basic {\em semi-definite} (SDP, aka spectral) relaxation, equivalent to a degree-2 SoS algorithms. Here we prove that also degree-4 SoS algorithms cannot improve this quadratic gap. This average-case lower bound adds to the small collection of hardness results in machine learning for this powerful family of convex relaxation algorithms. Moreover, our design of moments (or "pseudo-expectations") for this lower bound is quite different than previous lower bounds. Establishing lower bounds for higher degree SoS algorithms for remains a challenging problem.

Avi Wigderson | Tengyu Ma | A. Wigderson | Tengyu Ma

[1] E. Artin. Über die Zerlegung definiter Funktionen in Quadrate , 1927 .

[2] J. Krivine,et al. Anneaux préordonnés , 1964 .

[3] G. Stengle. A nullstellensatz and a positivstellensatz in semialgebraic geometry , 1974 .

[4] N. Z. Shor. An approach to obtaining global extremums in polynomial mathematical programming problems , 1987 .

[5] Warren P. Adams,et al. A hierarchy of relaxation between the continuous and convex hull representations , 1990 .

[6] Hanif D. Sherali,et al. A Hierarchy of Relaxations Between the Continuous and Convex Hull Representations for Zero-One Programming Problems , 1990, SIAM J. Discret. Math..

[7] K. Schmüdgen. TheK-moment problem for compact semi-algebraic sets , 1991 .

[8] M. Talagrand,et al. Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[9] Konrad Schm dgen. TheK-moment problem for compact semi-algebraic sets , 1991 .

[10] Alexander Schrijver,et al. Cones of Matrices and Set-Functions and 0-1 Optimization , 1991, SIAM J. Optim..

[11] V. Peña,et al. Decoupling Inequalities for the Tail Probabilities of Multivariate $U$-Statistics , 1993, math/9309211.

[12] David L. Donoho,et al. De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[13] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14] Dana Ron,et al. Computational sample complexity , 1997, COLT '97.

[15] I. Johnstone,et al. Minimax estimation via wavelet shrinkage , 1998 .

[16] U. Alon,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17] Rocco A. Servedio. Computational sample complexity and attribute-efficient learning , 1999, STOC '99.

[18] Yurii Nesterov,et al. Squared Functional Systems and Optimization Problems , 2000 .

[19] Rocco A. Servedio. Computational Sample Complexity and Attribute-Efficient Learning , 2000, J. Comput. Syst. Sci..

[20] P. Parrilo. Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[21] Dima Grigoriev,et al. Complexity of Positivstellensatz proofs for the knapsack , 2002, computational complexity.

[22] I. Johnstone. On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[23] Dima Grigoriev,et al. Linear lower bound on degrees of Positivstellensatz calculus proofs for the parity , 2001, Theor. Comput. Sci..

[24] Jean B. Lasserre,et al. Global Optimization with Polynomials and the Problem of Moments , 2000, SIAM J. Optim..

[25] Michael I. Jordan,et al. A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[26] Grant Schoenebeck,et al. Linear Level Lasserre Lower Bounds for Certain k-CSPs , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27] M. Wainwright,et al. High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[28] I. Johnstone,et al. On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[29] M. Laurent. Sums of Squares, Moment Matrices and Optimization Over Polynomials , 2009 .

[30] Francis R. Bach,et al. Structured Sparse Principal Component Analysis , 2009, AISTATS.

[31] Xi Chen,et al. Statistical Applications in Genetics and Molecular Biology Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing , 2012 .

[32] I. Johnstone,et al. Augmented sparse principal component analysis for high dimensional data , 2012, 1202.1242.

[33] Jing Lei,et al. Minimax Rates of Estimation for Sparse PCA in High Dimensions , 2012, AISTATS.

[34] P. Rigollet,et al. Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[35] Xiao-Tong Yuan,et al. Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[36] Michael I. Jordan,et al. Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[37] Vincent Q. Vu,et al. MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[38] Zongming Ma. Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[39] Philippe Rigollet,et al. Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[40] Nathan Linial,et al. More data speeds up training time in learning halfspaces over sparse vectors , 2013, NIPS.

[41] Avi Wigderson,et al. Sum-of-squares Lower Bounds for Planted Clique , 2015, STOC.

[42] David Steurer,et al. Rounding sum-of-squares relaxations , 2013, Electron. Colloquium Comput. Complex..

[43] David Steurer,et al. Sum-of-squares proofs and the quest toward optimal algorithms , 2014, Electron. Colloquium Comput. Complex..

[44] Zhaoran Wang,et al. Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, NIPS.

[45] Harrison H. Zhou,et al. Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[46] J. Lasserre. An Introduction to Polynomial and Semi-Algebraic Optimization , 2015 .

[47] Ankur Moitra,et al. Tensor Prediction, Rademacher Complexity and Random 3-XOR , 2015, ArXiv.

[48] Pravesh Kothari,et al. SoS and Planted Clique: Tight Analysis of MPW Moments at all Degrees and an Optimal Lower Bound at Degree Four , 2015, ArXiv.

[49] David Steurer,et al. Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[50] Andrea Montanari,et al. Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems , 2015, COLT.

[51] B. Nadler,et al. DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[52] Prasad Raghavendra,et al. Tight Lower Bounds for Planted Clique in the Degree-4 SOS Program , 2015, ArXiv.

[53] Zhaoran Wang,et al. On the Statistical Limits of Convex Relaxations , 2015, ICML.

[54] Andrea Montanari,et al. Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..