GAP Safe screening rules for sparse multi-task and multi-class models

High dimensional regression benefits from sparsity promoting regularizations. Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be safe. In this paper we derive new safe rules for generalized linear models regularized with l1 and l1/ l2 norms. The rules are based on duality gap computations and spherical safe regions whose diameters converge to zero. This allows to discard safely more variables, in particular for low regularization parameters. The GAP Safe rule can cope with any iterative solver and we illustrate its performance on coordinate descent for multi-task Lasso, binary and multinomial logistic regression, demonstrating significant speed ups on all tested datasets with respect to previous safe rules.

[1]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[5]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[6]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[7]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[8]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[9]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[11]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[14]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[15]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[16]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  A. Gramfort,et al.  Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods , 2012, Physics in medicine and biology.

[18]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[19]  Kazuhiro Seki,et al.  Block coordinate descent algorithms for large-scale sparse multiclass classification , 2013, Machine Learning.

[20]  Jiayu Zhou,et al.  A Safe Screening Rule for Sparse Logistic Regression , 2013, NIPS.

[21]  Rémi Gribonval,et al.  A dynamic screening principle for the Lasso , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[22]  Alexandre Gramfort,et al.  Mind the duality gap: safer rules for the Lasso , 2015, ICML.

[23]  Rémi Gribonval,et al.  Dynamic Screening: Accelerating First-Order Algorithms for the Lasso and Group-Lasso , 2014, IEEE Transactions on Signal Processing.

[24]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[25]  Peter J. Ramadge,et al.  Screening Tests for Lasso Problems , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.