The online performance estimation framework: heterogeneous ensemble learning for data streams

Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers in an ensemble. Using an internal evaluation on recent training data, it measures how well ensemble members performed on this and dynamically updates their weights. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging, while being significantly faster. All experimental results from this work are easily reproducible and publicly available online.

[1]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[2]  C. Apte,et al.  Data mining with decision trees and decision rules , 1997, Future Gener. Comput. Syst..

[3]  Jiju Antony,et al.  Experimental design and computer‐based simulation: a case study with the Royal Navy , 1999 .

[4]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[5]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[6]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[7]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[8]  Hilan Bensusan,et al.  Tell me who can learn you and I can tell you who you are: Landmarking Various Learning Algorithms , 2000 .

[9]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[10]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[11]  van,et al.  Massively collaborative machine learning , 2016 .

[12]  Geoff Holmes,et al.  New Options for Hoeffding Trees , 2007, Australian Conference on Artificial Intelligence.

[13]  Geoff Holmes,et al.  Algorithm Selection on Data Streams , 2014, Discovery Science.

[14]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[15]  João Gama,et al.  Cascade Generalization , 2000, Machine Learning.

[16]  Li Wan,et al.  Heterogeneous Ensemble for Feature Drifts in Data Streams , 2012, PAKDD.

[17]  João Gama,et al.  Recurrent concepts in data streams classification , 2013, Knowledge and Information Systems.

[18]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[19]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[20]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[21]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Eyke Hüllermeier,et al.  Recovery analysis for adaptive learning from non-stationary data streams: Experimental design and case study , 2015, Neurocomputing.

[23]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[24]  Geoff Holmes,et al.  Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data , 2012, IDA.

[25]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[26]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[27]  Eyke Hüllermeier,et al.  Efficient instance-based learning on data streams , 2007, Intell. Data Anal..

[28]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[29]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[30]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[31]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[32]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[33]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data , 2014, Neurocomputing.

[34]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[35]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[36]  João Gama,et al.  Forest trees for on-line data , 2004, SAC '04.

[37]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[38]  Geoff Holmes,et al.  Ensembles of Restricted Hoeffding Trees , 2012, TIST.

[39]  K. Ladha Condorcet's jury theorem in light of de Finetti's theorem , 1993 .

[40]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[41]  Geoff Holmes,et al.  Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams , 2015, 2015 IEEE International Conference on Data Mining.

[42]  Li Guo,et al.  Enabling Fast Lazy Learning for Data Streams , 2011, 2011 IEEE 11th International Conference on Data Mining.

[43]  Christophe G. Giraud-Carrier,et al.  A metric for unsupervised metalearning , 2011, Intell. Data Anal..

[44]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .

[45]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[46]  John R. Rice,et al.  The Algorithm Selection Problem—Abstract Models , 1974 .

[47]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[48]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.