Dynamic weighted majority: a new ensemble method for tracking concept drift

Algorithms for tracking concept drift are important for many applications. We present a general method based on the weighted majority algorithm for using any online learner for concept drift. Dynamic weighted majority (DWM) maintains an ensemble of base learners, predicts using a weighted-majority vote of these "experts", and dynamically creates and deletes experts in response to changes in performance. We empirically evaluated two experimental systems based on the method using incremental naive Bayes and incremental tree inducer [ITI] as experts. For the sake of comparison, we also included Blum's implementation of weighted majority. On the STAGGER concepts and on the SEA concepts, results suggest that the ensemble method learns drifting concepts almost as well as the base algorithms learn each concept individually. Indeed, we report the best overall results for these problems to date.

[1]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Ryszard S. Michalski,et al.  Incremental Generation of VL1 Hypotheses: The Underlying Methodology and the Description of Program AQ11 , 1983 .

[4]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[5]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[6]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[7]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[8]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[9]  Salvatore J. Stolfo,et al.  The application of AdaBoost for distributed, scalable and on-line learning , 1999, KDD '99.

[10]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[11]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Ryszard S. Michalski,et al.  Incremental Learning with Partial Instance Memory , 2002, ISMIS.

[14]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[15]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[16]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[17]  Marcus A. Maloof,et al.  Incremental rule learning with partial instance memory for changing concepts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[18]  Ryszard S. Michalski,et al.  Selecting Examples for Partial Memory Learning , 2000, Machine Learning.

[19]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[20]  Ryszard S. Michalski,et al.  On the Quasi-Minimal Solution of the General Covering Problem , 1969 .

[21]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[22]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[23]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[24]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[25]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[26]  Wolfgang Wahlster,et al.  Readings in Intelligent User Interfaces , 1998 .

[27]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[28]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[29]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[30]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[31]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[32]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[33]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[34]  Richard Granger,et al.  Beyond Incremental Processing: Tracking Concept Drift , 1986, AAAI.

[35]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[36]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[37]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.