A safe reinforced feature screening strategy for lasso based on feasible solutions

Abstract As a popular method in machine learning, lasso performs regression and feature selection simultaneously. However, for large datasets, the training efficiency of lasso remains a challenge. Recently, an Enhanced screening rule via Dual Polytope Projection (EDPP) was proposed to substantially reduce the scale of lasso by deleting inactive features beforehand. However, EDPP may mistakenly discard active features in practice due to the unavailable optimal solutions. To solve this problem, a safe reinforced feature screening rule based on EDPP and feasible solutions (S-EDPP) is introduced in this paper. By utilizing feasible solutions and estimating a proper upper bound of the deviation, S-EDPP can be guaranteed to be safe both in theory and in practice. Theoretical analysis of the deviation term in S-EDPP is given to verify its efficiency. Furthermore, S-EDPP is also extended to accelerate the Elastic Net, which is a corrective method of lasso. Experiments on synthetic and real datasets verify that S-EDPP is a safe modification of EDPP and it gives superior performance than other existing safe rules.

[1]  Bernhard Schölkopf,et al.  Screening Rules for Convex Problems , 2015 .

[2]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[3]  Dianhui Wang,et al.  High dimensional data regression using Lasso model and neural networks with random weights , 2016, Inf. Sci..

[4]  Alexandre Gramfort,et al.  GAP Safe screening rules for sparse multi-task and multi-class models , 2015, NIPS.

[5]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[6]  Verónica Bolón-Canedo,et al.  An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[7]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[8]  Cédric Herzet,et al.  Safe screening tests for LASSO based on firmly non-expansiveness , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Jieping Ye,et al.  Scaling Up Sparse Support Vector Machine by Simultaneous Feature and Sample Reduction , 2016, ICML.

[11]  Jieping Ye,et al.  Safe Screening With Variational Inequalities and Its Applicaiton to LASSO , 2013, ICML.

[12]  Rémi Gribonval,et al.  Dynamic Screening: Accelerating First-Order Algorithms for the Lasso and Group-Lasso , 2014, IEEE Transactions on Signal Processing.

[13]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[14]  Ichiro Takeuchi,et al.  Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling , 2016, ICML.

[15]  Xianli Pan,et al.  Safe Screening Rules for Accelerating Twin Support Vector Machine Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[17]  Tyler B. Johnson,et al.  Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization , 2015, ICML.

[18]  Yitian Xu,et al.  Scaling up twin support vector regression with safe screening rule , 2018, Inf. Sci..

[19]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[20]  Xianli Pan,et al.  A safe screening based framework for support vector regression , 2018, Neurocomputing.

[21]  W. Art Chaovalitwongse,et al.  Optimization Models for Feature Selection of Decomposed Nearest Neighbor , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Julien Mairal,et al.  Complexity Analysis of the Lasso Regularization Path , 2012, ICML.

[24]  Julio López,et al.  Double regularization methods for robust feature selection and SVM classification via DC programming , 2018, Inf. Sci..

[25]  Yong Fan,et al.  Feature selection by optimizing a lower bound of conditional mutual information , 2017, Inf. Sci..

[26]  Yacine Rezgui,et al.  Automated Model Construction for Combined Sewer Overflow Prediction Based on Efficient LASSO Algorithm , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[27]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[28]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[29]  Xiaochun Cao,et al.  Feature selection with spatial path coding for multimedia analysis , 2014, Inf. Sci..

[30]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[31]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[32]  Alexandre Gramfort,et al.  Mind the duality gap: safer rules for the Lasso , 2015, ICML.

[33]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.