Sparse PCA: A New Scalable Estimator Based On Integer Programming

We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, current MIP algorithms for SPCA are unable to scale beyond instances with a thousand features or so. In this paper, we propose a new estimator for SPCA which can be formulated as a MIP. Different from earlier work, we make use of the underlying spiked covariance model and properties of the multivariate Gaussian distribution to arrive at our estimator. We establish statistical guarantees for our proposed estimator in terms of estimation error and support recovery. We propose a custom algorithm to solve the MIP which is significantly more scalable than off-the-shelf solvers; and demonstrate that our approach can be much more computationally attractive compared to earlier exact MIP-based approaches for the SPCA problem. Our numerical experiments on synthetic and real datasets show that our algorithms can address problems with up to 20, 000 features in minutes; and generally result in favorable statistical properties compared to existing popular approaches for SPCA.

[1]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[2]  Marc Teboulle,et al.  Conditional Gradient Algorithmsfor Rank-One Matrix Approximations with a Sparsity Constraint , 2011, SIAM Rev..

[3]  Weijun Xie,et al.  Exact and Approximation Algorithms for Sparse PCA , 2020, ArXiv.

[4]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[5]  Hussein Hazimeh,et al.  Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms , 2018, Oper. Res..

[6]  Dimitris Bertsimas,et al.  Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality , 2020, J. Mach. Learn. Res..

[7]  Jianqing Fan,et al.  When is best subset selection the "best"? , 2020 .

[8]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[9]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[10]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[11]  Sinan Gürel,et al.  A strong conic quadratic reformulation for machine-job assignment with controllable processing times , 2009, Oper. Res. Lett..

[12]  Rahul Mazumder,et al.  Archetypal Analysis for Sparse Nonnegative Matrix Factorization: Robustness Under Misspecification , 2021, ArXiv.

[13]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[14]  Sung Min Park,et al.  Sparse PCA from Sparse Linear Regression , 2018, NeurIPS.

[15]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[16]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[17]  Ignacio E. Grossmann,et al.  An outer-approximation algorithm for a class of mixed-integer nonlinear programs , 1987, Math. Program..

[18]  B. Nadler,et al.  DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[19]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[20]  Dimitris Bertsimas,et al.  Certifiably optimal sparse principal component analysis , 2019, Mathematical Programming Computation.

[21]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[22]  Claudio Gentile,et al.  Perspective cuts for a class of convex 0–1 mixed integer programs , 2006, Math. Program..

[23]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[24]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[25]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[26]  Jianqing Fan,et al.  High-Dimensional Statistics , 2014 .

[27]  Rahul Mazumder,et al.  Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization , 2020, ArXiv.

[28]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[29]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[30]  Peter Richt'arik,et al.  Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes , 2012, Optimization and Engineering.

[31]  Oktay Günlük,et al.  Perspective reformulations of mixed integer nonlinear programs with indicator variables , 2010, Math. Program..

[32]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[33]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[34]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[35]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[36]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[37]  Santanu S. Dey,et al.  A convex integer programming approach for optimal sparse PCA , 2018 .

[38]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[39]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[40]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[41]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[42]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[43]  Vincent Q. Vu,et al.  Sparsistency and agnostic inference in sparse PCA , 2014, 1401.6978.

[44]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[45]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[46]  Tengyao Wang,et al.  Sparse principal component analysis via axis‐aligned random projections , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[47]  Harrison H. Zhou,et al.  OPTIMAL RATES OF CONVERGENCE FOR SPARSE COVARIANCE MATRIX ESTIMATION , 2012, 1302.3030.

[48]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..