Intelligent Adaptive Ensembles for Data Stream Mining: A High Return on Investment Approach

Online ensemble methods have been very successful to create accurate models against data streams that are susceptible to concept drift. The success of data stream mining has allowed diverse users to analyse their data in multiple domains, ranging from monitoring stock markets to analysing network traffic and exploring ATM transactions. Increasingly, data stream mining applications are running on mobile devices, utilizing the variety of data generated by sensors and network technologies. Subsequently, there has been a surge in interest in mobile or so-called pocket data stream mining, aiming to construct near real-time models. However, it follows that the computational resources are limited and that there is a need to adapt analytics to map the resource usage requirements. In this context, the resultant models produced by such algorithms should thus not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider Return on Investment ROI issues such as storage space needs and memory utilization. This paper introduces the Adaptive Ensemble Size AES algorithm, an extension of the Online Bagging method, to address this issue. Our AES method dynamically adapts the sizes of ensembles, based on the most recent memory usage requirements. Our results when comparing our AES algorithm with the state-of-the-art indicate that we are able to obtain a high Return on Investment ROI without compromising on the accuracy of the results.

[1]  Marcin Budka,et al.  Towards cost-sensitive adaptation: When is it worth updating your predictive model? , 2015, Neurocomputing.

[2]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[3]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Lei Liu,et al.  MobiMine: monitoring the stock market from a PDA , 2002, SKDD.

[6]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[7]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[9]  Kapil Keshao Wankhade,et al.  A fast and light classifier for data streams , 2010, Evol. Syst..

[10]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[11]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[12]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[13]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Indre Zliobaite,et al.  How good is the Electricity benchmark for evaluating concept drift adaptation , 2013, ArXiv.

[15]  Geoff Holmes,et al.  Algorithm Selection on Data Streams , 2014, Discovery Science.

[16]  Mohamed Medhat Gaber,et al.  Mobile Data Stream Mining: From Algorithms to Applications , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[17]  Mohamed Medhat Gaber,et al.  Context-aware adaptive data stream mining , 2009, Intell. Data Anal..

[18]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[19]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[20]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[21]  Mohamed Medhat Gaber,et al.  Pocket Data Mining: Big Data on Small Devices , 2013 .

[22]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[23]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[24]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Mohamed Medhat Gaber,et al.  Pocket Data Mining , 2014 .