A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority

We present an online ensemble approach, diversified dynamic weighted majority (DDWM) to classify new data instances which have varying conceptual distributions. Our approach maintains two sets of weighted ensembles that differentiate in their level of diversity. An expert in either of the ensembles is updated or removed as per its classification accuracy and a new expert is added based on the final global prediction of the algorithm and the global prediction of the ensemble for any data instance. Experimental evaluation using various artificial and real-world datasets proves that DDWM provides very high accuracy in classifying new data instances, irrespective of size of dataset, type of drift or presence of noise. We compare DDWM with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[3]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[4]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[5]  Philip S. Yu,et al.  Detection and Classification of Changes in Evolving Data Streams , 2006, Int. J. Inf. Technol. Decis. Mak..

[6]  Xin Yao,et al.  Using diversity to handle concept drift in on-line learning , 2009, 2009 International Joint Conference on Neural Networks.

[7]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[8]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[9]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[10]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[11]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[13]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[14]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[15]  Li Zhang,et al.  An adaptive ensemble classifier for mining concept drifting data streams , 2013, Expert Syst. Appl..

[16]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[17]  Kyosuke Nishida,et al.  Adaptive Classifiers-Ensemble System for Tracking Concept Drift , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[18]  Takashi Omori,et al.  ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments , 2005, Multiple Classifier Systems.

[19]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[20]  Richard Granger,et al.  Beyond Incremental Processing: Tracking Concept Drift , 1986, AAAI.

[21]  Theano Moussouri,et al.  Conversations about Home, Community and Identity , 2015 .

[22]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[23]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[24]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[27]  M. P. S. Bhatia,et al.  Extended Dynamic Weighted Majority Using Diversity to Handle Drifts , 2013, ADBIS.

[28]  Vasudha Bhatnagar,et al.  Towards an optimally pruned classifier ensemble , 2014, International Journal of Machine Learning and Cybernetics.

[29]  Raj K. Bhatnagar,et al.  Tracking recurrent concept drift in streaming data using ensemble classifiers , 2007, ICMLA 2007.

[30]  Li Su,et al.  A New Classification Algorithm for Data Stream , 2011 .

[31]  Ioannis T. Christou,et al.  A classifier ensemble approach to the TV-viewer profile adaptation problem , 2012, International Journal of Machine Learning and Cybernetics.

[32]  Gürsel Serpen,et al.  Performance of global–local hybrid ensemble versus boosting and bagging ensembles , 2012, International Journal of Machine Learning and Cybernetics.

[33]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[34]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[35]  Xizhao Wang,et al.  A New Approach to Classifier Fusion Based on Upper Integral , 2014, IEEE Transactions on Cybernetics.

[36]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[37]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[38]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[39]  Carlo Zaniolo,et al.  Fast and Light Boosting for Adaptive Mining of Data Streams , 2004, PAKDD.

[40]  Nikola Kasabov,et al.  Evolving connectionist systems , 2002 .

[41]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[42]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[43]  Haibo He,et al.  IMORL: Incremental Multiple-Object Recognition and Localization , 2008, IEEE Transactions on Neural Networks.

[44]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[45]  Wei-Pang Yang,et al.  Mining decision rules on data streams in the presence of concept drifts , 2009, Expert Syst. Appl..

[46]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[47]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[48]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[49]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[50]  Omar Zakaria,et al.  Improving exposure of intrusion deception system through implementation of hybrid honeypot , 2012, Int. Arab J. Inf. Technol..

[51]  Xindong Wu,et al.  Combining proactive and reactive predictions for data streams , 2005, KDD '05.

[52]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.