Gap Safe screening rules for sparsity enforcing penalties

In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the expected sparsity of the solutions and consequently leading to faster solvers. When the procedure is guaranteed not to discard variables wrongly the rules are said to be safe. In this work, we propose a unifying framework for generalized linear models regularized with standard sparsity enforcing penalties such as $\ell_1$ or $\ell_1/\ell_2$ norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but are particularly well suited to (block) coordinate descent methods. Applied to many standard learning tasks, Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc., we report significant speed-ups compared to previously proposed safe rules on all tested data sets.

[1]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[4]  Alexandre Gramfort,et al.  GAP Safe screening rules for sparse multi-task and multi-class models , 2015, NIPS.

[5]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[6]  Jens Haueisen,et al.  Time-frequency mixed-norm estimates: Sparse M/EEG imaging with non-stationary source activations , 2013, NeuroImage.

[7]  Yaohui Zeng,et al.  The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R , 2017, R J..

[8]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[9]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[10]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[11]  Ichiro Takeuchi,et al.  Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling , 2016, ICML.

[12]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[13]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[14]  Yonina C. Eldar,et al.  Collaborative hierarchical sparse modeling , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[15]  Alexandre Gramfort,et al.  GAP Safe Screening Rules for Sparse-Group Lasso , 2016, NIPS.

[16]  Bernhard Schölkopf,et al.  Screening Rules for Convex Problems , 2015 .

[17]  Rémi Gribonval,et al.  A dynamic screening principle for the Lasso , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[18]  Stefan Behnel,et al.  Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.

[19]  Jiayu Zhou,et al.  A Safe Screening Rule for Sparse Logistic Regression , 2013, NIPS.

[20]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[21]  Jieping Ye,et al.  Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets , 2014, NIPS.

[22]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[23]  A. Gramfort,et al.  Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods , 2012, Physics in medicine and biology.

[24]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[25]  Julian Zimmert Safe screening for support vector machines , 2015 .

[26]  Rémi Gribonval,et al.  Dynamic Screening: Accelerating First-Order Algorithms for the Lasso and Group-Lasso , 2014, IEEE Transactions on Signal Processing.

[27]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[28]  Qiang Yang,et al.  Multitask Learning for Protein Subcellular Location Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[30]  Matthieu Kowalski,et al.  Accelerating ISTA with an active set strategy , 2011 .

[31]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[32]  Tyler B. Johnson,et al.  Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization , 2016, NIPS.

[33]  Tyler B. Johnson,et al.  Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization , 2015, ICML.

[34]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[35]  Alexandre Gramfort,et al.  Mind the duality gap: safer rules for the Lasso , 2015, ICML.

[36]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[37]  Mohamed-Jalal Fadili,et al.  Local Linear Convergence of Forward-Backward under Partial Smoothness , 2014, NIPS.

[38]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[39]  Alexandre Gramfort,et al.  Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression , 2016, ArXiv.

[40]  Snigdhansu Chatterjee,et al.  Sparse Group Lasso: Consistency and Climate Applications , 2012, SDM.

[41]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[42]  Seunghak Lee,et al.  Screening Rules for Overlapping Group Lasso , 2014, ArXiv.

[43]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[44]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[45]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[46]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[47]  Hao Xu,et al.  Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries , 2011, NIPS.

[48]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[49]  Seunghak Lee,et al.  Adaptive Multi-Task Lasso: with Application to eQTL Detection , 2010, NIPS.

[50]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[51]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[52]  Peter J. Ramadge,et al.  Screening Tests for Lasso Problems , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..