Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

In high dimensional settings, sparse structures are crucial for efficiency, both in term of memory, computation and performance. It is customary to consider 1 penalty to enforce spar-sity in such scenarios. Sparsity enforcing methods, the Lasso being a canonical example, are popular candidates to address high dimension. For efficiency, they rely on tuning a parameter trading data fitting versus sparsity. For the Lasso theory to hold this tuning parameter should be proportional to the noise level, yet the latter is often unknown in practice. A possible remedy is to jointly optimize over the regression parameter as well as over the noise level. This has been considered under several names in the literature: Scaled-Lasso, Square-root Lasso, Concomitant Lasso estimation for instance, and could be of interest for confidence sets or uncertainty quantification. In this work, after illustrating numerical difficulties for the Smoothed Concomitant Lasso formulation, we propose a modification we coined Smoothed Concomitant Lasso, aimed at increasing numerical stability. We propose an efficient and accurate solver leading to a computational cost no more expansive than the one for the Lasso. We leverage on standard ingredients behind the success of fast Lasso solvers: a coordinate descent algorithm, combined with safe screening rules to achieve speed efficiency, by eliminating early irrelevant features.

[1]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[2]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[3]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[4]  Sangwoon Yun,et al.  On the Iteration Complexity of Cyclic Coordinate Gradient Descent Methods , 2014, SIAM J. Optim..

[5]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[6]  Alexandre Gramfort,et al.  Mind the duality gap: safer rules for the Lasso , 2015, ICML.

[7]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[8]  Johannes Lederer,et al.  Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions , 2013, 1306.0113.

[9]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[10]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[11]  Laurent El Ghaoui,et al.  Robust sketching for multiple square-root LASSO problems , 2014, AISTATS.

[12]  Cun-Hui Zhang,et al.  Comments on: ℓ1-penalization for mixture regression models , 2010 .

[13]  A. Antoniadis Comments on: ℓ1-penalization for mixture regression models , 2010 .

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  L. Toledo-Pereyra Trust , 2006, Mediation Behaviour.

[16]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[17]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[18]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[19]  Stéphane Chrétien,et al.  Sparse Recovery With Unknown Variance: A LASSO-Type Approach , 2011, IEEE Transactions on Information Theory.

[20]  A. Owen A robust hybrid of lasso and ridge regression , 2006 .

[21]  Alexandre Gramfort,et al.  GAP Safe screening rules for sparse multi-task and multi-class models , 2015, NIPS.

[22]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[23]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[24]  J. Peypouquet Convex Optimization in Normed Spaces: Theory, Methods and Examples , 2015 .

[25]  Jianqing Fan,et al.  Comments on: ℓ1-penalization for mixture regression models , 2010 .

[26]  Lee H. Dicker,et al.  Variance estimation in high-dimensional linear models , 2014 .

[27]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[28]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[29]  R. Tibshirani,et al.  A Study of Error Variance Estimation in Lasso Regression , 2013, 1311.5274.

[30]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[31]  Cun-Hui Zhang,et al.  Sparse matrix inversion with scaled Lasso , 2012, J. Mach. Learn. Res..

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..