论文信息 - Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection - 字舞流文

Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection

Variable selection is a fundamental task in statistical data analysis. Sparsity-inducing regu- larization methods are a popular class of methods that simultaneously perform variable selection and model estimation. The central problem is a quadratic optimization problem with an '0-norm penalty. Exactly enforcing the '0-norm penalty is computationally intractable for larger scale problems, so dif- ferent sparsity-inducing penalty functions that approximate the '0-norm have been introduced. In this paper, we show that viewing the problem from a convex relaxation perspective offers new insights. In particular, we show that a popular sparsity-inducing concave penalty function known as the Minimax Concave Penalty (MCP), and the reverse Huber penalty derived in a recent work by Pilanci, Wainwright and Ghaoui, can both be derived as special cases of a lifted convex relaxation called the perspective relaxation. The optimal perspective relaxation is a related minimax problem that balances the overall convexity and tightness of approximation to the '0 norm. We show it can be solved by a semidefinite re- laxation. Moreover, a probabilistic interpretation of the semidefinite relaxation reveals connections with the boolean quadric polytope in combinatorial optimization. Finally by reformulating the '0-norm pe- nalized problem as a two-level problem, with the inner level being a Max-Cut problem, our proposed semidefinite relaxation can be realized by replacing the inner level problem with its semidefinite relax- ation studied by Goemans and Williamson. This interpretation suggests using the Goemans-Williamson rounding procedure to find approximate solutions to the '0-norm penalized problem. Numerical ex-

Jeff T. Linderoth | Kun Chen | Hongbo Dong | Hongbo Dong | Kun Chen

[1] Hao Helen Zhang,et al. ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[2] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[3] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[4] Xiong Zhang,et al. Solving Large-Scale Sparse Semidefinite Programs for Combinatorial Optimization , 1999, SIAM J. Optim..

[5] Xin Shen,et al. Complementarity Formulations of ' 0 -norm Optimization Problems , 2013 .

[6] Jianqing Fan,et al. Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[7] Dimitris Bertsimas,et al. Algorithm for cardinality-constrained quadratic optimization , 2009, Comput. Optim. Appl..

[8] Daniel Bienstock,et al. Computational study of a family of mixed-integer quadratic programming problems , 1995, Math. Program..

[9] D. Bertsimas,et al. Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[10] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[11] J. Horowitz,et al. Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[12] Jian Huang,et al. A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[13] Chenlei Leng,et al. Unified LASSO Estimation by Least Squares Approximation , 2007 .

[14] Martin J. Wainwright,et al. Sparse learning via Boolean relaxations , 2015, Mathematical Programming.

[15] Oktay Günlük,et al. Perspective reformulations of mixed integer nonlinear programs with indicator variables , 2010, Math. Program..

[16] Manfred W. Padberg,et al. The boolean quadric polytope: Some characteristics, facets and relatives , 1989, Math. Program..

[17] Jian Huang,et al. COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[18] R. Tibshirani,et al. Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[19] Y. Nesterov. Quality of semidefinite relaxation for nonconvex quadratic optimization , 1997 .

[20] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[21] Jinchi Lv,et al. A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[22] J. Friedman,et al. A Statistical View of Some Chemometrics Regression Tools , 1993 .

[23] Claudio Gentile,et al. Perspective cuts for a class of convex 0–1 mixed integer programs , 2006, Math. Program..

[24] Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[25] Sara van de Geer,et al. Statistics for High-Dimensional Data , 2011 .

[26] Renato D. C. Monteiro,et al. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[27] R. Steele. Optimization , 2005 .

[28] J. Lasserre,et al. Handbook on Semidefinite, Conic and Polynomial Optimization , 2012 .

[29] Oktay Günlük,et al. Perspective Reformulation and Applications , 2012 .

[30] H. Zou. The Adaptive Lasso and Its Oracle Properties , 2006 .

[31] Decision Systems.,et al. Coordinate ascent for maximizing nondifferentiable concave functions , 1988 .

[32] Michel Deza,et al. Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[33] Claudio Gentile,et al. SDP diagonalizations and perspective cuts for a class of nonseparable MIQP , 2007, Oper. Res. Lett..

[34] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[35] Cun-Hui Zhang,et al. Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[36] David P. Williamson,et al. .879-approximation algorithms for MAX CUT and MAX 2SAT , 1994, STOC '94.

[37] Jeff T. Linderoth,et al. On Valid Inequalities for Quadratic Programming with Continuous Variables and Binary Indicators , 2013, IPCO.

[38] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[39] R. Tyrrell Rockafellar,et al. Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[40] Jianqing Fan,et al. A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[41] G. Casella,et al. Springer Texts in Statistics , 2016 .

[42] Yinyu Ye,et al. DSDP5: Software for Semidefinite Programming , 2005 .