EACD: evolutionary adaptation to concept drifts in data streams

This paper presents a novel ensemble learning method based on evolutionary algorithms to cope with different types of concept drifts in non-stationary data stream classification tasks. In ensemble learning, multiple learners forming an ensemble are trained to obtain a better predictive performance compared to that of a single learner, especially in non-stationary environments, where data evolve over time. The evolution of data streams can be viewed as a problem of changing environment, and evolutionary algorithms offer a natural solution to this problem. The method proposed in this paper uses random subspaces of features from a pool of features to create different classification types in the ensemble. Each such type consists of a limited number of classifiers (decision trees) that have been built at different times over the data stream. An evolutionary algorithm (replicator dynamics) is used to adapt to different concept drifts; it allows the types with a higher performance to increase and those with a lower performance to decrease in size. Genetic algorithm is then applied to build a two-layer architecture based on the proposed technique to dynamically optimise the combination of features in each type to achieve a better adaptation to new concepts. The proposed method, called EACD, offers both implicit and explicit mechanisms to deal with concept drifts. A set of experiments employing four artificial and five real-world data streams is conducted to compare its performance with that of the state-of-the-art algorithms using the immediate and delayed prequential evaluation methods. The results demonstrate favourable performance of the proposed EACD method in different environments.

[1]  Giandomenico Spezzano,et al.  An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[2]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[3]  Frank Kirchner,et al.  Performance evaluation of EANT in the robocup keepaway benchmark , 2007, ICMLA 2007.

[4]  Roberto Souto Maior de Barros,et al.  RCD: A recurring concept drift framework , 2013, Pattern Recognit. Lett..

[5]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[6]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[7]  Mohamed Medhat Gaber,et al.  A genetic algorithm approach to optimising random forests applied to class engineered data , 2017, Inf. Sci..

[8]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[10]  Giandomenico Spezzano,et al.  GP ensembles for large-scale data classification , 2006, IEEE Transactions on Evolutionary Computation.

[11]  Carlo Zaniolo,et al.  Fast and Light Boosting for Adaptive Mining of Data Streams , 2004, PAKDD.

[12]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  L MinkuLeandro,et al.  Ensemble learning for data stream analysis , 2017 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Mohamed Medhat Gaber,et al.  A Replicator Dynamics Approach to Collective Feature Engineering in Random Forests , 2015, SGAI Conf..

[16]  Raju Nedunchezhian,et al.  Mining data streams with concept drifts using genetic algorithm , 2011, Artificial Intelligence Review.

[17]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[18]  Gouri Deshpande,et al.  Analysis of the survey , 2002 .

[19]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[20]  Giandomenico Spezzano,et al.  An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams , 2007 .

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  Ghazal Jaber,et al.  An approach for online learning in the presence of concept changes. (Une approche pour l'apprentissage en-ligne en présence de changements de concept.) , 2013 .

[23]  Mitsuo Gen,et al.  Genetic algorithms and engineering optimization , 1999 .

[24]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[25]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[26]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[27]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[28]  Christopher Joseph Pal,et al.  Improving probabilistic inference in graphical models with determinism and cycles , 2016, Machine Learning.

[29]  P. Harwood Michael , 1985 .

[30]  Joelle Pineau,et al.  Online Bagging and Boosting for Imbalanced Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[31]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[32]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[33]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[34]  I. Bomze Lotka-Volterra equation and replicator dynamics: A two-dimensional classification , 1983, Biological Cybernetics.

[35]  J. Hofbauer,et al.  Evolutionary game dynamics , 2011 .

[36]  Maria Virvou,et al.  An Intelligent TV-Shopping Application that Provides Recommendations , 2007 .

[37]  Hsuan-Tien Lin,et al.  An Online Boosting Algorithm with Theoretical Justifications , 2012, ICML.

[38]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[39]  Heitor Murilo Gomes,et al.  SAE2: advances on the social adaptive ensemble classifier for data streams , 2014, SAC.

[40]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[41]  Jerzy Stefanowski,et al.  Combining block-based and online methods in learning ensembles from concept drifting data streams , 2014, Inf. Sci..

[42]  Archana Mantri,et al.  High Performance Architecture and Grid Computing - International Conference, HPAGC 2011, Chandigarh, India, July 19-20, 2011. Proceedings , 2011, HPAGC.

[43]  Jean Paul Barddal,et al.  A Survey on Ensemble Learning for Data Stream Classification , 2017, ACM Comput. Surv..

[44]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[45]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[46]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[47]  Heitor Murilo Gomes,et al.  SAE: Social Adaptive Ensemble classifier for data streams , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[48]  Raj K. Bhatnagar,et al.  Tracking recurrent concept drift in streaming data using ensemble classifiers , 2007, ICMLA 2007.