Learning Classification Rules with Differential Evolution for High-Speed Data Stream Mining on GPU s

High-speed data streams are potentially infinite sequences of rapidly arriving instances that may be subject to concept drift phenomenon. Hence, dedicated learning algorithms must be able to update themselves with new data and provide an accurate prediction in a limited amount of time. This requirement was considered as prohibitive for using evolutionary algorithms for high-speed data stream mining. This paper introduces a massively parallel implementation on GPUs of a differential evolution algorithm for learning classification rules in the presence of concept drift. The proposal based on the DE /rand - to - best/1/bin strategy takes advantage of up to four nested levels of parallelism to maximize the performance of the algorithm. Efficient GPU kernels parallelize the evolution of the populations, rules, conditional clauses, and evaluation on instances. The proposed method is evaluated on 25 data stream benchmarks considering different types of concept drifts. Results are compared with other publicly available streaming rule learners. Obtained results and their statistical analysis proves an excellent performance of the proposed classifier that offers improved predictive accuracy, model update time, decision time, and a compact rule set.

[1]  Mohamed Medhat Gaber,et al.  Advances in data stream mining , 2012, WIREs Data Mining Knowl. Discov..

[2]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[3]  Sebastián Ventura,et al.  Parallel evaluation of Pittsburgh rule-based classifiers on GPUs , 2014, Neurocomputing.

[4]  Joelle Pineau,et al.  Online Bagging and Boosting for Imbalanced Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  Geoff Holmes,et al.  Streaming Data Mining with Massive Online Analytics (MOA) , 2018 .

[6]  Alberto Cano,et al.  A survey on graphic processing unit computing for large‐scale data mining , 2018, WIREs Data Mining Knowl. Discov..

[7]  Ivanoe De Falco,et al.  A Differential Evolution approach for classification of Multiple Sclerosis lesions , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[8]  Kathryn E. Merrick,et al.  Investigating Differential Evolution based rule discovery in learning classifier systems , 2013, 2013 IEEE Symposium on Differential Evolution (SDE).

[9]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[10]  Gilles Venturini,et al.  SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts , 1993, ECML.

[11]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[12]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[13]  João Gama,et al.  Learning Decision Rules from Data Streams , 2011, IJCAI.

[14]  Giuseppe Di Fatta,et al.  Computationally Efficient Rule-Based Classification for Continuous Streaming Data , 2014, SGAI Conf..

[15]  Albert Bifet,et al.  Efficient Online Evaluation of Big Data Stream Classifiers , 2015, KDD.

[16]  Sebastián Ventura,et al.  A Parallel Genetic Programming Algorithm for Classification , 2011, HAIS.

[17]  Geoff Holmes,et al.  The online performance estimation framework: heterogeneous ensemble learning for data streams , 2017, Machine Learning.

[18]  Sebastián Ventura,et al.  GPU-parallel subtree interpreter for genetic programming , 2014, GECCO.

[19]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[20]  Jaume Bacardit,et al.  Large-scale experimental evaluation of GPU strategies for evolutionary machine learning , 2016, Inf. Sci..

[21]  Ioannis B. Theocharis,et al.  DECO3R: A Differential Evolution-based algorithm for generating compact Fuzzy Rule-based Classification Systems , 2016, Knowl. Based Syst..

[22]  Jesús S. Aguilar-Ruiz,et al.  Data streams classification by incremental rule learning with parameterized generalization , 2006, SAC '06.

[23]  Eyke Hüllermeier,et al.  Learning TSK Fuzzy Rules from Data Streams , 2017, ECML/PKDD.

[24]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[25]  Mohamed Medhat Gaber,et al.  eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams , 2012, SGAI Conf..

[26]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[27]  Bartosz Krawczyk,et al.  Evolving rule-based classifiers with genetic programming on GPUs for drifting data streams , 2019, Pattern Recognit..

[28]  Jesús S. Aguilar-Ruiz,et al.  Incremental Rule Learning and Border Examples Selection from Numerical Data Streams , 2005, J. Univers. Comput. Sci..

[29]  João Gama,et al.  Very fast decision rules for classification in data streams , 2013, Data Mining and Knowledge Discovery.

[30]  Sebastián Ventura,et al.  An interpretable classification rule mining algorithm , 2013, Inf. Sci..

[31]  Wei Fan,et al.  Extremely Fast Decision Tree Mining for Evolving Data Streams , 2017, KDD.

[32]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[33]  Sergio Ramírez-Gallego,et al.  Nearest Neighbor Classification for High-Speed Big Data Streams Using Spark , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[34]  Bartosz Krawczyk,et al.  Online ensemble learning with abstaining classifiers for drifting and noisy data streams , 2017, Appl. Soft Comput..

[35]  Ivanoe De Falco,et al.  Differential Evolution for automatic rule extraction from medical databases , 2013, Appl. Soft Comput..