Boosting for high-dimensional linear models

We prove that boosting with the squared error loss, L 2 Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the 11 -norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the l 1 -norm. We also propose here an AIC-based method for tuning, namely for choosing the number of boosting iterations. This makes L 2 Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L 2 Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

[1]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[2]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[3]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[7]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[8]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[9]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[10]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[11]  Vladimir N. Temlyakov,et al.  Weak greedy algorithms[*]This research was supported by National Science Foundation Grant DMS 9970326 and by ONR Grant N00014‐96‐1‐1003. , 2000, Adv. Comput. Math..

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Alexander Goldenshluger,et al.  Adaptive Prediction and Estimation in Linear Regression with Infinitely Many Parameters , 2001 .

[14]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[16]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[17]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[18]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[19]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[20]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[21]  Peter Bühlmann,et al.  Finding predictive gene groups from microarray data , 2004 .

[22]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[23]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[24]  P. Bühlmann,et al.  Sparse Boosting , 2006, J. Mach. Learn. Res..