Sum-of-Squares Lower Bounds for Sparse PCA

This paper establishes a statistical versus computational trade-off for solving a basic high-dimensional machine learning problem via a basic convex relaxation method. Specifically, we consider the {\em Sparse Principal Component Analysis} (Sparse PCA) problem, and the family of {\em Sum-of-Squares} (SoS, aka Lasserre/Parillo) convex relaxations. It was well known that in large dimension $p$, a planted $k$-sparse unit vector can be {\em in principle} detected using only $n \approx k\log p$ (Gaussian or Bernoulli) samples, but all {\em efficient} (polynomial time) algorithms known require $n \approx k^2$ samples. It was also known that this quadratic gap cannot be improved by the the most basic {\em semi-definite} (SDP, aka spectral) relaxation, equivalent to a degree-2 SoS algorithms. Here we prove that also degree-4 SoS algorithms cannot improve this quadratic gap. This average-case lower bound adds to the small collection of hardness results in machine learning for this powerful family of convex relaxation algorithms. Moreover, our design of moments (or "pseudo-expectations") for this lower bound is quite different than previous lower bounds. Establishing lower bounds for higher degree SoS algorithms for remains a challenging problem.

[1]  E. Artin Über die Zerlegung definiter Funktionen in Quadrate , 1927 .

[2]  J. Krivine,et al.  Anneaux préordonnés , 1964 .

[3]  G. Stengle A nullstellensatz and a positivstellensatz in semialgebraic geometry , 1974 .

[4]  N. Z. Shor An approach to obtaining global extremums in polynomial mathematical programming problems , 1987 .

[5]  Warren P. Adams,et al.  A hierarchy of relaxation between the continuous and convex hull representations , 1990 .

[6]  Hanif D. Sherali,et al.  A Hierarchy of Relaxations Between the Continuous and Convex Hull Representations for Zero-One Programming Problems , 1990, SIAM J. Discret. Math..

[7]  K. Schmüdgen TheK-moment problem for compact semi-algebraic sets , 1991 .

[8]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[9]  Konrad Schm dgen TheK-moment problem for compact semi-algebraic sets , 1991 .

[10]  Alexander Schrijver,et al.  Cones of Matrices and Set-Functions and 0-1 Optimization , 1991, SIAM J. Optim..

[11]  V. Peña,et al.  Decoupling Inequalities for the Tail Probabilities of Multivariate $U$-Statistics , 1993, math/9309211.

[12]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[13]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14]  Dana Ron,et al.  Computational sample complexity , 1997, COLT '97.

[15]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[16]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Rocco A. Servedio Computational sample complexity and attribute-efficient learning , 1999, STOC '99.

[18]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[19]  Rocco A. Servedio Computational Sample Complexity and Attribute-Efficient Learning , 2000, J. Comput. Syst. Sci..

[20]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[21]  Dima Grigoriev,et al.  Complexity of Positivstellensatz proofs for the knapsack , 2002, computational complexity.

[22]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[23]  Dima Grigoriev,et al.  Linear lower bound on degrees of Positivstellensatz calculus proofs for the parity , 2001, Theor. Comput. Sci..

[24]  Jean B. Lasserre,et al.  Global Optimization with Polynomials and the Problem of Moments , 2000, SIAM J. Optim..

[25]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[26]  Grant Schoenebeck,et al.  Linear Level Lasserre Lower Bounds for Certain k-CSPs , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[27]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[28]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[29]  M. Laurent Sums of Squares, Moment Matrices and Optimization Over Polynomials , 2009 .

[30]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[31]  Xi Chen,et al.  Statistical Applications in Genetics and Molecular Biology Adaptive Elastic-Net Sparse Principal Component Analysis for Pathway Association Testing , 2012 .

[32]  I. Johnstone,et al.  Augmented sparse principal component analysis for high dimensional data , 2012, 1202.1242.

[33]  Jing Lei,et al.  Minimax Rates of Estimation for Sparse PCA in High Dimensions , 2012, AISTATS.

[34]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[35]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[36]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[37]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[38]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[39]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[40]  Nathan Linial,et al.  More data speeds up training time in learning halfspaces over sparse vectors , 2013, NIPS.

[41]  Avi Wigderson,et al.  Sum-of-squares Lower Bounds for Planted Clique , 2015, STOC.

[42]  David Steurer,et al.  Rounding sum-of-squares relaxations , 2013, Electron. Colloquium Comput. Complex..

[43]  David Steurer,et al.  Sum-of-squares proofs and the quest toward optimal algorithms , 2014, Electron. Colloquium Comput. Complex..

[44]  Zhaoran Wang,et al.  Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, NIPS.

[45]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[46]  J. Lasserre An Introduction to Polynomial and Semi-Algebraic Optimization , 2015 .

[47]  Ankur Moitra,et al.  Tensor Prediction, Rademacher Complexity and Random 3-XOR , 2015, ArXiv.

[48]  Pravesh Kothari,et al.  SoS and Planted Clique: Tight Analysis of MPW Moments at all Degrees and an Optimal Lower Bound at Degree Four , 2015, ArXiv.

[49]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[50]  Andrea Montanari,et al.  Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems , 2015, COLT.

[51]  B. Nadler,et al.  DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[52]  Prasad Raghavendra,et al.  Tight Lower Bounds for Planted Clique in the Degree-4 SOS Program , 2015, ArXiv.

[53]  Zhaoran Wang,et al.  On the Statistical Limits of Convex Relaxations , 2015, ICML.

[54]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..