Importance Sampled Learning Ensembles

Learning a function of many arguments is viewed from the perspective of high– dimensional numerical quadrature. It is shown that many of the popular ensemble learning procedures can be cast in this framework. In particular randomized methods, including bagging and random forests, are seen to correspond to random Monte Carlo integration methods each based on particular importance sampling strategies. Non random boosting methods are seen to correspond to deterministic quasi Monte Carlo integration techniques. This view helps explain some of their properties and suggests modifications to them that can substantially improve their accuracy while dramatically improving computational performance.

[1]  A. Stroud Approximate calculation of multiple integrals , 1973 .

[2]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[3]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[8]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  David H. Wolpert,et al.  Combining Stacking With Bagging To Improve A Learning Algorithm , 1996 .

[11]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[12]  L. Breiman Random Forests--random Features , 1999 .

[13]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[14]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[15]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[19]  Bogdan E. Popescu,et al.  Gradient Directed Regularization for Linear Regression and Classi…cation , 2004 .

[20]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[21]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[22]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[23]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.