A Framework for Classification in Data Streams Using Multi-strategy Learning

Adaptive online learning algorithms have been successfully applied to fast-evolving data streams. Such streams are susceptible to concept drift, which implies that the most suitable type of classifier often changes over time. In this setting, a system that is able to seamlessly select the type of learner that presents the current “best” model holds much value. For example, in a scenario such as user profiling for security applications, model adaptation is of the utmost importance. We have implemented a multi-strategy framework, the so-called Tornado environment, which is able to run multiple and diverse classifiers simultaneously for decision making. In our framework, the current learner with the highest performance, at a specific point in time, is selected and the corresponding model is then provided to the user. In our implementation, we employ an Error-Memory-Runtime (EMR) measure which combines the error-rate, the memory usage and the runtime of classifiers as a performance indicator. We conducted experiments on synthetic and real-world datasets with the Hoeffding Tree, Naive Bayes, Perceptron, K-Nearest Neighbours and Decision Stumps algorithms. Our results indicate that our environment is able to adapt to changes and to continuously select the best current type of classifier, as the data evolve.

[1]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[2]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[3]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[4]  Gerhard Widmer,et al.  Adapting to Drift in Continuous Domains (Extended Abstract) , 1995, ECML.

[5]  Herna L. Viktor,et al.  Intelligent Adaptive Ensembles for Data Stream Mining: A High Return on Investment Approach , 2015, NFMCP.

[6]  Mohamed Medhat Gaber,et al.  Pocket Data Mining: Big Data on Small Devices , 2013 .

[7]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[8]  F. Oppacher,et al.  Evolutionary Data Mining With Automatic Rule Generalization , 2001 .

[9]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[10]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Sebastian Thrun,et al.  Online Speed Adaptation Using Supervised Learning for High-Speed, Off-Road Autonomous Driving , 2007, IJCAI.

[13]  Gillian Dobbie,et al.  Drift Detection Using Stream Volatility , 2015, ECML/PKDD.

[14]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[15]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[16]  Ivan Bratko,et al.  Machine Learning by Function Decomposition , 1997, ICML.

[17]  Geoff Holmes,et al.  Fast Perceptron Decision Tree Learning from Evolving Data Streams , 2010, PAKDD.

[18]  Marcin Budka,et al.  Towards cost-sensitive adaptation: When is it worth updating your predictive model? , 2015, Neurocomputing.

[19]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[20]  Miroslav KUBAT,et al.  Adapting to Drift in Continuous DomainsMiroslav KUBATInstitute for Systems Sciences , 1995 .

[21]  Mohamed Medhat Gaber,et al.  Pocket Data Mining , 2014 .

[22]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[23]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[24]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[25]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.