Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. The method, dynamic weighted majority (*DWM*), uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble. After an extensive evaluation---consisting of five experiments, eight learners, and thirty data sets that varied in type of target concept, size, presence of noise, and the like---we concluded that *DWM* outperformed other learners that only incrementally learn concept descriptions, that maintain and use previously encountered examples, and that employ an unweighted, fixed-size ensemble of experts.

[1]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[2]  Salvatore J. Stolfo,et al.  The application of AdaBoost for distributed, scalable and on-line learning , 1999, KDD '99.

[3]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[4]  Nitesh V. Chawla,et al.  Learning Ensembles from Bites: A Scalable and Accurate Approach , 2004, J. Mach. Learn. Res..

[5]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[6]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Udo Hahn,et al.  Concept Versioning: A Methodology for Tracking Evolutionary Concept Drift in Dynamic Concept Systems , 1994, ECAI.

[9]  Ryszard S. Michalski,et al.  Incremental learning with partial instance memory , 2002, Artif. Intell..

[10]  Philip M. Long,et al.  Tracking drifting concepts by minimizing disagreements , 2004, Machine Learning.

[11]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[12]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[13]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[14]  Philip M. Long,et al.  Tracking drifting concepts using random examples , 1991, Annual Conference Computational Learning Theory.

[15]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[16]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[17]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[18]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[19]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[20]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[21]  Chris Mesterharm,et al.  Tracking Linear-threshold Concepts with Winnow , 2003, J. Mach. Learn. Res..

[22]  Michaela M. Black,et al.  Refined Time Stamps for Concept Drift Detection During Mining for Classification Rules , 2000, TSDM.

[23]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[24]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[25]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[26]  Ryszard S. Michalski,et al.  Incremental learning of concept descriptions: A method and experimental results , 1988 .

[27]  Ryszard S. Michalski,et al.  On the Quasi-Minimal Solution of the General Covering Problem , 1969 .

[28]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[29]  Zijian Zheng,et al.  Naive Bayesian Classifier Committees , 1998, ECML.

[30]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[31]  Peter Auer,et al.  Tracking the Best Disjunction , 1998, Machine Learning.

[32]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[33]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[34]  JefI’rty C. Schlirrlrrer Beyond incremental processing : Tracking concept drift , 1999 .

[35]  Tommi S. Jaakkola,et al.  Online Learning of Non-stationary Sequences , 2003, NIPS.

[36]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[37]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[38]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[39]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[40]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[41]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[42]  Michaela M. Black,et al.  Classification of Customer Call Data in the Presence of Concept Drift and Noise , 2002, Soft-Ware.

[43]  Philip M. Long,et al.  Tracking Drifting Concepts By Minimizing Disagreements , 2004, Machine Learning.

[44]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[45]  Ian Witten,et al.  Data Mining , 2000 .

[46]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[47]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[48]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[49]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[50]  Ryszard S. Michalski,et al.  Incremental Generation of VL1 Hypotheses: The Underlying Methodology and the Description of Program AQ11 , 1983 .

[51]  R. Rescorla Probability of shock in the presence and absence of CS in fear conditioning. , 1968, Journal of comparative and physiological psychology.

[52]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[53]  Gerhard Widmer,et al.  Tracking Context Changes through Meta-Learning , 1997, Machine Learning.

[54]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[55]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[56]  João Gama,et al.  Learning decision trees from dynamic data streams , 2005, SAC '05.

[57]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[58]  Scott A. Brandt,et al.  Adaptive Caching by Refetching , 2002, NIPS.

[59]  Wei Fan StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams , 2004, VLDB.

[60]  L. Breiman Arcing Classifiers , 1998 .

[61]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[62]  M. Harries Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning , 1995 .

[63]  Michaela M. Black,et al.  Maintaining the performance of a learned classifier under concept drift , 1999, Intell. Data Anal..

[64]  Alexey Tsymbal,et al.  Dynamic Integration of Classifiers for Tracking Concept Drift in Antibiotic Resistance Data , 2005 .

[65]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[66]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[67]  D. Pfeffermann,et al.  Small area estimation , 2011 .

[68]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[69]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[70]  Marcus A. Maloof,et al.  Incremental rule learning with partial instance memory for changing concepts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[71]  Ryszard S. Michalski,et al.  Selecting Examples for Partial Memory Learning , 2000, Machine Learning.

[72]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[73]  Ronald L. Rivest,et al.  Learning Time-Varying Concepts , 1990, NIPS.

[74]  J. C. Schlimmer,et al.  Concept acquisition through representational adjustment , 1987 .

[75]  Berkman Sahiner,et al.  Dual system approach to computer-aided detection of breast masses on mammograms. , 2006, Medical physics.

[76]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.