Ensembles of Restricted Hoeffding Trees

The success of simple methods for classification shows that is is often not necessary to model complex attribute interactions to obtain good classification accuracy on practical problems. In this article, we propose to exploit this phenomenon in the data stream context by building an ensemble of Hoeffding trees that are each limited to a small subset of attributes. In this way, each tree is restricted to model interactions between attributes in its corresponding subset. Because it is not known a priori which attribute subsets are relevant for prediction, we build exhaustive ensembles that consider all possible attribute subsets of a given size. As the resulting Hoeffding trees are not all equally important, we weigh them in a suitable manner to obtain accurate classifications. This is done by combining the log-odds of their probability estimates using sigmoid perceptrons, with one perceptron per class. We propose a mechanism for setting the perceptrons’ learning rate using the change detection method for data streams, and also use to reset ensemble members (i.e., Hoeffding trees) when they no longer perform well. Our experiments show that the resulting ensemble classifier outperforms bagging for data streams in terms of accuracy when both are used in conjunction with adaptive naive Bayes Hoeffding trees, at the expense of runtime and memory consumption. We also show that our stacking method can improve the performance of a bagged ensemble.

[1]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[2]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  BifetAlbert,et al.  Ensembles of Restricted Hoeffding Trees , 2012 .

[6]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[7]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[8]  Richard Brendon Kirkby,et al.  Improving Hoeffding Trees , 2007 .

[9]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[10]  주철환 H.O.T , 1999 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[13]  Geoff Holmes,et al.  Stress-Testing Hoeffding Trees , 2005, PKDD.

[14]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[15]  Geoff Holmes,et al.  Fast Perceptron Decision Tree Learning from Evolving Data Streams , 2010, PAKDD.

[16]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[17]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[18]  Geoff Holmes,et al.  Improving Adaptive Bagging Methods for Evolving Data Streams , 2009, ACML.

[19]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[20]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[22]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[23]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[24]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[25]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[26]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[27]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.