Combining Stacking With Bagging To Improve A Learning Algorithm

In bagging Bre94a] one uses bootstrap replicates of the training set Efr79, ET93] to improve a learning algorithm's performance, often by tens of percent. This paper presents several ways that stacking Wol92b, Bre92] can be used in concert with the bootstrap procedure to achieve a further improvement on the performance of bagging for some regression problems. In particular, in some of the work presented here, one rst converts a single underlying learning algorithm into several learning algorithms. This is done by bootstrap resampling the training set, exactly as in bagging. The resultant algorithms are then combined via stacking. This procedure can be viewed as a variant of bagging, where stacking rather than uniform averaging is used to achieve the combining. The stacking improves performance over simple bagging by up to a factor of 2 on the tested problems, and never resulted in worse performance than simple bagging. In other work presented here, there is no step of converting the underlying learning algorithm into multiple algorithms, so it is the improve-a-single-algorithm variant of stacking that is relevant. The precise version of this scheme tested can be viewed as using the bootstrap and stacking to estimate the input-dependence of the statistical bias and then correct for it. The results are preliminary, but again indicate that combining stacking with the bootstrap can be helpful.

[1]  B. Efron Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[2]  David H. Wolpert,et al.  On the Connection between In-sample Testing and Generalization Error , 1992, Complex Syst..

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[5]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[6]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[7]  Eric B. Bartlett,et al.  Error Estimation by Series Association for Neural Network Systems , 1995, Neural Computation.

[8]  Ronny Meir,et al.  Bias, variance and the combination of estimators; The case of linear least squares , 1995 .

[9]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[10]  David H. Wolpert,et al.  The Existence of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[13]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[14]  David H. Wolpert,et al.  On Bias Plus Variance , 1997, Neural Computation.