Safe feature screening rules for the regularized Huber regression

Abstract With the dramatic development of data collection and storage techniques, we often encounter massive high-dimensional data sets which contain outliers and heavy-tailed errors. Recently, the regularized Huber regression has been extensively developed to deal with such complex data sets. Although there are dozens of papers devoted to developing efficient solvers for the regularized Huber regression, it remains challenging when the number of features is extremely large. In this paper, we propose safe feature screening rules for the regularized Huber regression based on duality theory. These rules can remarkably accelerate the existing solvers for the regularized Huber regression by quickly reducing the number of features. To be specific, the proposed safe feature screening rules enable to identify and eliminate inactive features before starting the solver, then the computational effort can be saved significantly. Moreover, the proposed screening rules are safe in theory and practice. Finally, the experimental results on both synthetic and real data sets illustrate that the proposed screening rules can accelerate the speed of solving the regularized Huber regression and maintain its accuracy. In particular, when the number of features is large, the speedup obtained by our rules can be orders of magnitude.

[1]  Jianqing Fan,et al.  I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. , 2015, Annals of statistics.

[2]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  W. Steiger,et al.  Least Absolute Deviations: Theory, Applications and Algorithms , 1984 .

[5]  Yun Wang,et al.  Lasso screening with a small regularization parameter , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[7]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[8]  Xianli Pan,et al.  A safe reinforced feature screening strategy for lasso based on feasible solutions , 2019, Inf. Sci..

[9]  Peter J. Ramadge,et al.  Screening Tests for Lasso Problems , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Yun Wang,et al.  Tradeoffs in improved screening of lasso problems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Hansheng Wang,et al.  Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso , 2007 .

[14]  Jieping Ye,et al.  Safe Screening With Variational Inequalities and Its Applicaiton to LASSO , 2013, ICML.

[15]  Jian Huang,et al.  Semismooth Newton Coordinate Descent Algorithm for Elastic-Net Penalized Huber Loss Regression and Quantile Regression , 2015, 1509.02957.

[16]  Mei Li,et al.  Double fused Lasso penalized LAD for matrix regression , 2019, Appl. Math. Comput..

[17]  Bradley Efron,et al.  Large-scale inference , 2010 .

[18]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[19]  Qiang Sun,et al.  Adaptive Huber Regression , 2017, Journal of the American Statistical Association.

[20]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[21]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[22]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[23]  Alexandre Gramfort,et al.  Gap Safe screening rules for sparsity enforcing penalties , 2016, J. Mach. Learn. Res..

[24]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[25]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[26]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .

[27]  Eric P. Xing,et al.  Ensembles of Lasso Screening Rules , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[29]  Jianqing Fan,et al.  Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[30]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[31]  Huifang Wang,et al.  The linearized alternating direction method of multipliers for sparse group LAD model , 2019, Optim. Lett..