Computationally Efficient Rule-Based Classification for Continuous Streaming Data

Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[4]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[5]  João Gama,et al.  Learning Decision Rules from Data Streams , 2011, IJCAI.

[6]  S. Hoeglinger,et al.  Use of Hoeffding trees in concept based data stream mining , 2007, 2007 Third International Conference on Information and Automation for Sustainability.

[7]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Bhavani M. Thuraisingham Data mining for security applications: Mining concept-drifting data streams to detect peer to peer botnet traffic , 2008, ISI.

[10]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[11]  David B. Skillicorn,et al.  Streaming Random Forests , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[12]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[13]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[14]  Mohamed Medhat Gaber,et al.  eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams , 2012, SGAI Conf..

[15]  Marcel Abendroth,et al.  Data Mining Practical Machine Learning Tools And Techniques With Java Implementations , 2016 .

[16]  Mohamed Medhat Gaber,et al.  A Survey of Classification Methods in Data Streams , 2007, Data Streams - Models and Algorithms.

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[19]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[20]  Eduardo Freire Nakamura,et al.  Data Stream Based Algorithms For Wireless Sensor Network Applications , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[21]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.