Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso

BACKGROUND Penalization and regularization techniques for statistical modeling have attracted increasing attention in biomedical research due to their advantages in the presence of high-dimensional data. A special focus lies on algorithms that incorporate automatic variable selection like the least absolute shrinkage operator (lasso) or statistical boosting techniques. OBJECTIVES Focusing on the linear regression framework, this article compares the two most-common techniques for this task, the lasso and gradient boosting, both from a methodological and a practical perspective. METHODS We describe these methods highlighting under which circumstances their results will coincide in low-dimensional settings. In addition, we carry out extensive simulation studies comparing the performance in settings with more predictors than observations and investigate multiple combinations of noise-to-signal ratio and number of true non-zero coeffcients. Finally, we examine the impact of different tuning methods on the results. RESULTS Both methods carry out penalization and variable selection for possibly highdimensional data, often resulting in very similar models. An advantage of the lasso is its faster run-time, a strength of the boosting concept is its modular nature, making it easy to extend to other regression settings. CONCLUSIONS Although following different strategies with respect to optimization and regularization, both methods imply similar constraints to the estimation problem leading to a comparable performance regarding prediction accuracy and variable selection in practice.

[1]  Gerhard Tutz,et al.  Selection of ordinally scaled independent variables with applications to international classification of functioning core sets , 2011 .

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[4]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[5]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[6]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[7]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[8]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[9]  A Mayr,et al.  The Evolution of Boosting Algorithms , 2014, Methods of Information in Medicine.

[10]  G. Tutz,et al.  Generalized Additive Modeling with Implicit Variable Selection by Likelihood‐Based Boosting , 2006, Biometrics.

[11]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[12]  P. Bühlmann Boosting for high-dimensional linear models , 2006 .

[13]  Fabian Scheipl,et al.  Penalized likelihood and Bayesian function selection in regression models , 2013, AStA Advances in Statistical Analysis.

[14]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[15]  Torsten Hothorn,et al.  Model-Based Boosting , 2015 .

[16]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[17]  A Ziegler,et al.  Discussion of “The Evolution of Boosting Algorithms” and “Extending Statistical Boosting” , 2014, Methods of Information in Medicine.

[18]  Zhu Wang,et al.  HingeBoost: ROC-Based Boost for Classification and Variable Selection , 2011 .

[19]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[20]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[21]  H. Binder,et al.  Extending Statistical Boosting , 2014, Methods of Information in Medicine.

[22]  Schumacher Martin,et al.  Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples , 2008 .

[23]  M. Schmid,et al.  The Importance of Knowing When to Stop , 2012, Methods of Information in Medicine.

[24]  Anne-Laure Boulesteix,et al.  On the choice and influence of the number of boosting steps , 2016 .

[25]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[26]  A. Zwinderman,et al.  Statistical Applications in Genetics and Molecular Biology Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis , 2011 .

[27]  Zachary A. Szpiech,et al.  Statistical Applications in Genetics and Molecular Biology Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis , 2011 .

[28]  Charles Soussen,et al.  On LARS/Homotopy Equivalence Conditions for Over-Determined LASSO , 2012, IEEE Signal Processing Letters.

[29]  N. Meinshausen,et al.  Discussion: A tale of three cousins: Lasso, L2Boosting and Dantzig , 2007, 0803.3134.

[30]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[31]  R. Tibshirani,et al.  Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[32]  C. Wang,et al.  Statistical Applications in Genetics and Molecular Biology Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data , 2011 .

[33]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[34]  Thomas Kneib,et al.  Geoadditive expectile regression , 2012, Comput. Stat. Data Anal..

[35]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[36]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[37]  Gerhard Tutz,et al.  Variable Selection and Model Choice in Geoadditive Regression Models , 2009, Biometrics.

[38]  Gerhard Tutz,et al.  Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression , 2010 .

[39]  T Hothorn Boosting - an unusual yet attractive optimiser. , 2014, Methods of information in medicine.

[40]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[41]  Torsten Hothorn,et al.  A unified framework of constrained regression , 2014, Stat. Comput..

[42]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[43]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[44]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[45]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[46]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[47]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[48]  Bing Zhang,et al.  A Unified Mixed Effects Model for Gene Set Analysis of Time Course Microarray Experiments , 2009, Statistical applications in genetics and molecular biology.

[50]  B. Peter BOOSTING FOR HIGH-DIMENSIONAL LINEAR MODELS , 2006 .

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[53]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.