Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection

Variable selection is a fundamental task in statistical data analysis. Sparsity-inducing regu- larization methods are a popular class of methods that simultaneously perform variable selection and model estimation. The central problem is a quadratic optimization problem with an '0-norm penalty. Exactly enforcing the '0-norm penalty is computationally intractable for larger scale problems, so dif- ferent sparsity-inducing penalty functions that approximate the '0-norm have been introduced. In this paper, we show that viewing the problem from a convex relaxation perspective offers new insights. In particular, we show that a popular sparsity-inducing concave penalty function known as the Minimax Concave Penalty (MCP), and the reverse Huber penalty derived in a recent work by Pilanci, Wainwright and Ghaoui, can both be derived as special cases of a lifted convex relaxation called the perspective relaxation. The optimal perspective relaxation is a related minimax problem that balances the overall convexity and tightness of approximation to the '0 norm. We show it can be solved by a semidefinite re- laxation. Moreover, a probabilistic interpretation of the semidefinite relaxation reveals connections with the boolean quadric polytope in combinatorial optimization. Finally by reformulating the '0-norm pe- nalized problem as a two-level problem, with the inner level being a Max-Cut problem, our proposed semidefinite relaxation can be realized by replacing the inner level problem with its semidefinite relax- ation studied by Goemans and Williamson. This interpretation suggests using the Goemans-Williamson rounding procedure to find approximate solutions to the '0-norm penalized problem. Numerical ex-

[1]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[4]  Xiong Zhang,et al.  Solving Large-Scale Sparse Semidefinite Programs for Combinatorial Optimization , 1999, SIAM J. Optim..

[5]  Xin Shen,et al.  Complementarity Formulations of ' 0 -norm Optimization Problems , 2013 .

[6]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[7]  Dimitris Bertsimas,et al.  Algorithm for cardinality-constrained quadratic optimization , 2009, Comput. Optim. Appl..

[8]  Daniel Bienstock,et al.  Computational study of a family of mixed-integer quadratic programming problems , 1995, Math. Program..

[9]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[10]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[11]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[12]  Jian Huang,et al.  A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[13]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[14]  Martin J. Wainwright,et al.  Sparse learning via Boolean relaxations , 2015, Mathematical Programming.

[15]  Oktay Günlük,et al.  Perspective reformulations of mixed integer nonlinear programs with indicator variables , 2010, Math. Program..

[16]  Manfred W. Padberg,et al.  The boolean quadric polytope: Some characteristics, facets and relatives , 1989, Math. Program..

[17]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[18]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[19]  Y. Nesterov Quality of semidefinite relaxation for nonconvex quadratic optimization , 1997 .

[20]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[21]  Jinchi Lv,et al.  A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[22]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[23]  Claudio Gentile,et al.  Perspective cuts for a class of convex 0–1 mixed integer programs , 2006, Math. Program..

[24]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[25]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[26]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[27]  R. Steele Optimization , 2005 .

[28]  J. Lasserre,et al.  Handbook on Semidefinite, Conic and Polynomial Optimization , 2012 .

[29]  Oktay Günlük,et al.  Perspective Reformulation and Applications , 2012 .

[30]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[31]  Decision Systems.,et al.  Coordinate ascent for maximizing nondifferentiable concave functions , 1988 .

[32]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[33]  Claudio Gentile,et al.  SDP diagonalizations and perspective cuts for a class of nonseparable MIQP , 2007, Oper. Res. Lett..

[34]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[35]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[36]  David P. Williamson,et al.  .879-approximation algorithms for MAX CUT and MAX 2SAT , 1994, STOC '94.

[37]  Jeff T. Linderoth,et al.  On Valid Inequalities for Quadratic Programming with Continuous Variables and Binary Indicators , 2013, IPCO.

[38]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[39]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[40]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[41]  G. Casella,et al.  Springer Texts in Statistics , 2016 .

[42]  Yinyu Ye,et al.  DSDP5: Software for Semidefinite Programming , 2005 .