Boosting Methods for Regression

In this paper we examine ensemble methods for regression that leverage or “boost” base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an algorithm that requires only modest assumptions on the base learning method for its strong theoretical guarantees. We present several gradient descent leveraging algorithms for regression and prove AdaBoost-style bounds on their sample errors using intuitive assumptions on the base learners. We bound the complexity of the regression functions produced in order to derive PAC-style bounds on their generalization errors. Experiments validate our theoretical results.

[1]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[2]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[3]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[4]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[5]  Paola Campadelli,et al.  A Boosting Algorithm for Regression , 1997, ICANN.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[8]  Gunnar Rätsch,et al.  Barrier Boosting , 2000, COLT.

[9]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[10]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[16]  David P. Helmbold,et al.  Potential Boosters? , 1999, NIPS.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[21]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[22]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[23]  Peter L. Bartlett,et al.  On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[24]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[25]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[26]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[27]  David P. Helmbold,et al.  A geometric approach to leveraging weak learners , 1999, Theor. Comput. Sci..

[28]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[29]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[30]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[31]  J. Friedman Stochastic gradient boosting , 2002 .

[32]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[33]  Thomas Richardson,et al.  Boosting methodology for regression problems , 1999, AISTATS.

[34]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[35]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[36]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.