High-Dimensional $L_2$Boosting: Rate of Convergence

Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.

[1]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[2]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[5]  Vladimir N. Temlyakov,et al.  Weak greedy algorithms[*]This research was supported by National Science Foundation Grant DMS 9970326 and by ONR Grant N00014‐96‐1‐1003. , 2000, Adv. Comput. Math..

[6]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[7]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[8]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[9]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[10]  E. Livshits,et al.  Rate of Convergence of Pure Greedy Algorithms , 2004 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[12]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[13]  L. Breiman Arcing Classifiers , 1998 .

[14]  Tze Leung Lai,et al.  Self-Normalized Processes , 2009 .

[15]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2012, 1212.6885.

[16]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[17]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[19]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[20]  Ronald A. DeVore,et al.  Some remarks on greedy algorithms , 1996, Adv. Comput. Math..

[21]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[22]  T. Lai,et al.  A STEPWISE REGRESSION METHOD AND CONSISTENT MODEL SELECTION FOR HIGH-DIMENSIONAL SPARSE LINEAR MODELS , 2011 .

[23]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[24]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[25]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[26]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[27]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[28]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[29]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[30]  Paul Grigas,et al.  A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives , 2015, ArXiv.

[31]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[32]  V. Temlyakov,et al.  Two Lower Estimates in Greedy Approximation , 2001 .