Principal Component Hierarchy for Sparse Quadratic Programs

We propose a novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of the quadratic matrix. Each level of approximation admits a min-max characterization whose objective function can be optimized over the binary variables analytically, while preserving convexity in the continuous variables. Exploiting this property, we propose two scalable optimization algorithms, coined as the “best response” and the “dual program”, that can efficiently screen the potential indices of the nonzero elements of the original program. We show that the proposed methods are competitive with the existing screening methods in the current sparse regression literature, and it is particularly fast on instances with high number of measurements in experiments with both synthetic and real datasets.

[1]  Michael R. Baye,et al.  Combining ridge and principal component regression:a money demand illustration , 1984 .

[2]  T. Næs,et al.  Principal component regression in NIR analysis: Viewpoints, background details and selection of components , 1988 .

[3]  A. Atkinson Subset Selection in Regression , 1992 .

[4]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Laurent El Ghaoui,et al.  Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[7]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[8]  R. X. Liu,et al.  Principal component regression analysis with SPSS , 2003, Comput. Methods Programs Biomed..

[9]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[11]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[12]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[13]  Yonina C. Eldar,et al.  Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms , 2012, SIAM J. Optim..

[14]  Daniel Dolz,et al.  Quadratic MPC with ℓ0-input constraint , 2014 .

[15]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[16]  Nadav Hallak,et al.  On the Minimization Over Sparse Symmetric Sets: Projections, Optimality Conditions, and Algorithms , 2016, Math. Oper. Res..

[17]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[18]  R. Tibshirani,et al.  Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso , 2017, 1707.08692.

[19]  Dimitris Bertsimas,et al.  The Trimmed Lasso: Sparsity and Robustness , 2017, 1708.04527.

[20]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[21]  P. Radchenko,et al.  Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR Is Low , 2017, Oper. Res..

[22]  Dimitris Bertsimas,et al.  A unified approach to mixed-integer optimization: Nonlinear formulations and scalable algorithms , 2019, ArXiv.

[23]  Madeleine Udell,et al.  Why Are Big Data Matrices Approximately Low Rank? , 2017, SIAM J. Math. Data Sci..

[24]  Alper Atamtürk,et al.  Safe Screening Rules for $\ell_0$-Regression. , 2020, 2004.08773.

[25]  Wei-Shi Zheng,et al.  A Block Decomposition Algorithm for Sparse Optimization , 2019, KDD.

[26]  D. Bertsimas,et al.  A UNIFIED APPROACH TO MIXED-INTEGER OPTIMIZATION , 2020 .

[27]  Weijun Xie,et al.  Scalable Algorithms for the Sparse Ridge Regression , 2018, SIAM J. Optim..

[28]  R. Mazumder,et al.  Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives , 2020, J. Mach. Learn. Res..

[29]  Oleg A. Prokopyev,et al.  A Mixed-Integer Fractional Optimization Approach to Best Subset Selection , 2021, INFORMS J. Comput..

[30]  R. Mazumder,et al.  Sparse regression at scale: branch-and-bound rooted in first-order optimization , 2020, Mathematical Programming.

[31]  Dimitris Bertsimas,et al.  Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality , 2020, J. Mach. Learn. Res..

[32]  Dimitris Bertsimas,et al.  A Scalable Algorithm for Sparse Portfolio Selection , 2018, INFORMS J. Comput..