Stacking Bagged and Dagged Models

In this paper, we investigate the method of stacked generalization in combining models derived from diierent subsets of a training dataset by a single learning algorithm, as well as diierent algorithms. The simplest way to combine predictions from competing models is majority vote, and the eeect of the sampling regime used to generate training subsets has already been studied in this context|when bootstrap samples are used the method is called bagging, and for disjoint samples we call it dagging. This paper extends these studies to stacked generalization, where a learning algorithm is employed to combine the models. This yields new methods dubbed bag-stacking and dag-stacking. We demonstrate that bag-stacking and dag-stacking can be eeective for classiication tasks even when the training samples cover just a small fraction of the full dataset. In contrast to earlier bagging results, we show that bagging and bag-stacking work for stable as well as unstable learning algorithms, as do dagging and dag-stacking. We nd that bag-stacking (dag-stacking) almost always has higher predictive accuracy than bagging (dagging), and we also show that bag-stacking models derived using two diier-ent algorithms is more eeective than bagging.

[1]  L. Breiman Pasting Bites Together For Prediction In Large Data Sets And On-Line , 1996 .

[2]  Igor Kononenko,et al.  Learning as Optimization: Stochastic Generation of Multiple Knowledge , 1992, ML.

[3]  Eric B. Bartlett,et al.  Error Estimation by Series Association for Neural Network Systems , 1995, Neural Computation.

[4]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[5]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[8]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[9]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[10]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[11]  Ronny Meir,et al.  Bias, variance and the combination of estimators; The case of linear least squares , 1995 .

[12]  H. Sebastian Seung,et al.  Learning from a Population of Hypotheses , 1993, COLT '93.

[13]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[14]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[15]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[16]  Bill Fulkerson,et al.  Machine Learning, Neural and Statistical Classification , 1995 .

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Denise Scott Brown Learning From Pop , 1973 .

[19]  Kai Ming Ting,et al.  Model Combination in the Multiple-Data-Batches Scenario , 1997, ECML.