Optimal Average-Case Reductions to Sparse PCA: From Weak Assumptions to Strong Hardness

In the past decade, sparse principal component analysis has emerged as an archetypal problem for illustrating statistical-computational tradeoffs. This trend has largely been driven by a line of research aiming to characterize the average-case complexity of sparse PCA through reductions from the planted clique (PC) conjecture - which conjectures that there is no polynomial-time algorithm to detect a planted clique of size $K = o(N^{1/2})$ in $\mathcal{G}(N, \frac{1}{2})$. All previous reductions to sparse PCA either fail to show tight computational lower bounds matching existing algorithms or show lower bounds for formulations of sparse PCA other than its canonical generative model, the spiked covariance model. Also, these lower bounds all quickly degrade with the exponent in the PC conjecture. Specifically, when only given the PC conjecture up to $K = o(N^\alpha)$ where $\alpha < 1/2$, there is no sparsity level $k$ at which these lower bounds remain tight. If $\alpha \le 1/3$ these reductions fail to even show the existence of a statistical-computational tradeoff at any sparsity $k$. We give a reduction from PC that yields the first full characterization of the computational barrier in the spiked covariance model, providing tight lower bounds at all sparsities $k$. We also show the surprising result that weaker forms of the PC conjecture up to clique size $K = o(N^\alpha)$ for any given $\alpha \in (0, 1/2]$ imply tight computational lower bounds for sparse PCA at sparsities $k = o(n^{\alpha/3})$. This shows that even a mild improvement in the signal strength needed by the best known polynomial-time sparse PCA algorithms would imply that the hardness threshold for PC is subpolynomial. This is the first instance of a suboptimal hardness assumption implying optimal lower bounds for another problem in unsupervised learning.

[1]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Allen Y. Yang,et al.  Informative feature selection for object recognition via Sparse PCA , 2011, 2011 International Conference on Computer Vision.

[3]  Rahul Santhanam,et al.  On the Average-Case Complexity of MCSP and Its Variants , 2017, CCC.

[4]  Bruce E. Hajek,et al.  Computational Lower Bounds for Community Detection on Random Graphs , 2014, COLT.

[5]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[6]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[7]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[8]  Yihong Wu,et al.  Computational Barriers in Minimax Submatrix Detection , 2013, ArXiv.

[9]  Yuval Peres,et al.  Finding Hidden Cliques in Linear Time with High Probability , 2010, Combinatorics, Probability and Computing.

[10]  Sébastien Bubeck,et al.  Testing for high‐dimensional geometry in random graphs , 2014, Random Struct. Algorithms.

[11]  Yihong Wu,et al.  Statistical and Computational Limits for Sparse Matrix Detection , 2018, The Annals of Statistics.

[12]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[13]  Danning Li,et al.  Approximation of Rectangular Beta-Laguerre Ensembles and Large Deviations , 2013, 1309.3882.

[14]  D. Freedman,et al.  A dozen de Finetti-style results in search of a theory , 1987 .

[15]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[16]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[17]  U. Feige,et al.  Finding hidden cliques in linear time , 2009 .

[18]  T. J. Page Multivariate Statistics: A Vector Space Approach , 1984 .

[19]  Aviad Rubinstein,et al.  On the Approximability of Sparse PCA , 2016, COLT.

[20]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.

[21]  Moses Charikar,et al.  On Finding Dense Common Subgraphs , 2018, ArXiv.

[22]  Boaz Barak,et al.  The Complexity of Public-Key Cryptography , 2017, Tutorials on the Foundations of Cryptography.

[23]  Pascal Koiran,et al.  Hidden Cliques and the Certification of the Restricted Isometry Property , 2012, IEEE Transactions on Information Theory.

[24]  Sanjeev Arora,et al.  Computational complexity and information asymmetry in financial products , 2011, Commun. ACM.

[25]  Kevin A. Lai,et al.  Label optimal regret bounds for online local learning , 2015, COLT.

[26]  B. Nadler,et al.  DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[27]  Genevera I. Allen,et al.  Sparse non-negative generalized PCA with applications to metabolomics , 2011, Bioinform..

[28]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[29]  Quentin Berthet,et al.  Optimal link prediction with matrix logistic regression , 2018, 1803.07054.

[30]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[31]  Ivan Nourdin,et al.  Asymptotic Behavior of Large Gaussian Correlated Wishart Matrices , 2018, Journal of Theoretical Probability.

[32]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[33]  Pravesh Kothari,et al.  A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[34]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[35]  Luca Trevisan,et al.  Average-Case Complexity , 2006, Found. Trends Theor. Comput. Sci..

[36]  Sébastien Bubeck,et al.  Entropic CLT and phase transition in high-dimensional Wishart matrices , 2015, ArXiv.

[37]  Amit Daniely,et al.  Complexity Theoretic Limitations on Learning DNF's , 2014, COLT.

[38]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[39]  Ari Juels,et al.  Hiding Cliques for Cryptographic Security , 1998, SODA '98.

[40]  Mark Jerrum,et al.  Large Cliques Elude the Metropolis Process , 1992, Random Struct. Algorithms.

[41]  Dan Shen,et al.  Consistency of sparse PCA in High Dimension, Low Sample Size contexts , 2011, J. Multivar. Anal..

[42]  Wasim Huleihel,et al.  Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure , 2018, COLT.

[43]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[44]  Alexander A. Razborov,et al.  Clique is hard on average for regular resolution , 2018, STOC.

[45]  Uriel Feige,et al.  Resolution lower bounds for the weak pigeon hole principle , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[46]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[47]  Huchuan Lu,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Online Object Tracking with Sparse Prototypes , 2022 .

[48]  Santosh S. Vempala,et al.  Random Tensors and Planted Cliques , 2009, APPROX-RANDOM.

[49]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[50]  Stephen A. Vavasis,et al.  Nuclear norm minimization for the planted clique and biclique problems , 2009, Math. Program..

[51]  P. Hall,et al.  Using Evidence of Mixed Populations to Select Variables for Clustering Very High-Dimensional Data , 2010 .

[52]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[53]  T. Cai,et al.  Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[54]  Prasad Raghavendra,et al.  The Power of Sum-of-Squares for Detecting Hidden Structures , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[55]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[56]  L. Devroye,et al.  The total variation distance between high-dimensional Gaussians , 2018, 1810.08693.

[57]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[58]  Angshul Majumdar Image compression by sparse PCA coding in curvelet domain , 2009, Signal Image Video Process..

[59]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[60]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[61]  Daniela Witten,et al.  Classification of RNA-seq Data , 2014 .

[62]  Avi Wigderson,et al.  Sum-of-Squares Lower Bounds for Sparse PCA , 2015, NIPS.

[63]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..

[64]  Junwei Lu,et al.  The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference , 2018, ICML.

[65]  Anru R. Zhang,et al.  Tensor SVD: Statistical and Computational Limits , 2017, IEEE Transactions on Information Theory.

[66]  Nathan Linial,et al.  From average case complexity to improper learning complexity , 2013, STOC.

[67]  Ludek Kucera,et al.  Expected Complexity of Graph Partitioning Problems , 1995, Discret. Appl. Math..

[68]  Alan M. Frieze,et al.  A new approach to the planted clique problem , 2008, FSTTCS.

[69]  Wasim Huleihel,et al.  Universality of Computational Lower Bounds for Submatrix Detection , 2019, COLT.

[70]  Avi Wigderson,et al.  Public-key cryptography from different assumptions , 2010, STOC '10.

[71]  Yaniv Plan,et al.  Average-case hardness of RIP certification , 2016, NIPS.

[72]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[73]  Hyonho Chun,et al.  Expression Quantitative Trait Loci Mapping With Multivariate Sparse Partial Least Squares Regression , 2009, Genetics.

[74]  Tengyuan Liang,et al.  Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix , 2015, 1502.01988.

[75]  U. Feige,et al.  Finding and certifying a large hidden clique in a semirandom graph , 2000 .