Boosting for high-dimensional linear models

We prove that boosting with the squared error loss, L 2 Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O (exp(sample size)), assuming that the true underlying regression function is sparse in terms of the ℓ 1 -norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the ℓ 1 -norm. We also propose here an AIC based method for tuning, namely for choosing the number of boosting iterations. This makes L 2 Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L 2 Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

[1]  P. Bühlmann,et al.  Sparse Boosting , 2006, J. Mach. Learn. Res..

[2]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[5]  Peter Bühlmann,et al.  Finding predictive gene groups from microarray data , 2004 .

[6]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[7]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[8]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[9]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[10]  Alexander Goldenshluger,et al.  Adaptive Prediction and Estimation in Linear Regression with Infinitely Many Parameters , 2001 .

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[14]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[15]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[16]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[21]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..