Online Ensemble Learning: An Empirical Study

We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles. Our learning algorithms are inspired by the previously published “boosting by filtering” framework as well as the offline Arc-x4 boosting-style algorithm. We train ensembles of online decision trees using a novel variant of the ID4 online decision-tree algorithm as the base learner, and show empirical results for both boosting and bagging-style online ensemble methods. Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims. First, we show empirically that our extensions to ID4 significantly improve performance for single trees and additionally are critical to achieving performance gains in tree ensembles. Second, our results indicate significant improvements in predictive accuracy with ensemble size for the boosting-style algorithm. The bagging algorithms we tried showed poor performance relative to the boosting-style algorithm (but still improve upon individual base learners). Third, we show that ensembles of small trees are often able to outperform large single trees with the same number of nodes (and similarly outperform smaller ensembles of larger trees that use the same total number of nodes). This makes online boosting particularly useful in domains such as branch prediction with tight space restrictions (i.e., the available real-estate on a microprocessor chip).

[1]  J. Gerring A case study , 2011, Technology and Society.

[2]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[3]  J. Ross Quinlan,et al.  An Empirical Comparison of Genetic and Decision-Tree Classifiers , 1988, ML.

[4]  C. Matheus A constructive induction framework , 1989, ICML 1989.

[5]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[6]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[7]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[9]  S. McFarling Combining Branch Predictors , 1993 .

[10]  Yale N. Patt,et al.  Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[11]  Ravi Nair,et al.  Dynamic path-based branch correlation , 1995, MICRO 28.

[12]  Yale N. Patt,et al.  An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[16]  Nicholas C. Gloy,et al.  A Language For Describing Predictors And Its Application To Automatic Synthesis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[17]  Dirk Grunwald,et al.  Evidence-based static branch prediction using machine learning , 1997, TOPL.

[18]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[19]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[20]  L. Breiman Arcing Classifiers , 1998 .

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Trevor N. Mudge,et al.  The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[23]  Salvatore J. Stolfo,et al.  The application of AdaBoost for distributed, scalable and on-line learning , 1999, KDD '99.

[24]  James E. Smith,et al.  Improving branch predictors by correlating on data values , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[25]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[26]  Brad Calder,et al.  Automated design of finite state machine predictors for customized processors , 2001, ISCA 2001.

[27]  Daniel A. Jiménez,et al.  Boolean formula-based branch prediction for future technologies , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[28]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[29]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[30]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[31]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[32]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[33]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[34]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[35]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[36]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[37]  Babak Falsafi,et al.  Dynamic feature selection for hardware prediction , 2006, J. Syst. Archit..