On expressiveness and uncertainty awareness in rule-based classification for data streams

Mining data streams is a core element of Big Data Analytics. It represents the velocity of large datasets, which is one of the four aspects of Big Data, the other three being volume, variety and veracity. As data streams in, models are constructed using data mining techniques tailored towards continuous and fast model update. The Hoeffding Inequality has been among the most successful approaches in learning theory for data streams. In this context, it is typically used to provide a statistical bound for the number of examples needed in each step of an incremental learning process. It has been applied to both classification and clustering problems. Despite the success of the Hoeffding Tree classifier and other data stream mining methods, such models fall short of explaining how their results (i.e., classifications) are reached (black boxing). The expressiveness of decision models in data streams is an area of research that has attracted less attention, despite its paramount of practical importance. In this paper, we address this issue, adopting Hoeffding Inequality as an upper bound to build decision rules which can help decision makers with informed predictions (white boxing). We termed our novel method Hoeffding Rules with respect to the use of the Hoeffding Inequality in the method, for estimating whether an induced rule from a smaller sample would be of the same quality as a rule induced from a larger sample. The new method brings in a number of novel contributions including handling uncertainty through abstaining, dealing with continuous data through Gaussian statistical modelling, and an experimentally proven fast algorithm. We conducted a thorough experimental study using benchmark datasets, showing the efficiency and expressiveness of the proposed technique when compared with the state-of-the-art.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  Max Bramer Principles of Data Mining , 2013, Undergraduate Topics in Computer Science.

[4]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[5]  Max Bramer,et al.  Automatic Induction of Classification Rules from Examples Using N-Prism , 2000 .

[6]  Mohamed Medhat Gaber,et al.  eRules: A Modular Adaptive Classification Rule Learning Algorithm for Data Streams , 2012, SGAI Conf..

[7]  Kun Liu,et al.  VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring , 2004, SDM.

[8]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[9]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[10]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[11]  David B. Skillicorn,et al.  Streaming Random Forests , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[12]  Albert Bifet,et al.  Deep learning in partially-labeled data streams , 2015, SAC.

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Mohamed Medhat Gaber,et al.  A Survey of Classification Methods in Data Streams , 2007, Data Streams - Models and Algorithms.

[15]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[16]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[17]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[18]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[19]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[20]  Frederic T. Stahl,et al.  Towards Online Concept Drift Detection with Feature Selection for Data Stream Classification , 2016, ECAI.

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[22]  D. Altman,et al.  Statistics notes: The normal distribution , 1995, BMJ.

[23]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[24]  Max Bramer,et al.  Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks , 2012, Knowl. Based Syst..

[25]  Magdalena Deckert Incremental Rule-Based Learners for Handling Concept Drift: An Overview , 2013 .

[26]  Max Bramer,et al.  Random Prism: An Alternative to Random Forests , 2011, SGAI Conf..

[27]  Mohamed Medhat Gaber,et al.  Advances in data stream mining , 2012, WIREs Data Mining Knowl. Discov..

[28]  Simon Fong,et al.  Moderated VFDT in Stream Mining Using Adaptive Tie Threshold and Incremental Pruning , 2011, DaWaK.

[29]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[30]  Divyakant Agrawal,et al.  Supporting sliding window queries for continuous data streams , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[31]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[32]  Frans Coenen,et al.  Research and Development in Intelligent Systems XXIV, Proceedings of AI-2007, the Twenty-seventh SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, UK, December 2007 , 2008, SGAI Conf..

[33]  Walid G. Aref,et al.  Detection and Tracking of Discrete Phenomena in Sensor-Network Databases , 2005, SSDBM.

[34]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[35]  Max Bramer,et al.  An Information-Theoretic Approach to the Pre-pruning of Classification Rules , 2002, Intelligent Information Processing.

[36]  Giuseppe Di Fatta,et al.  Computationally Efficient Rule-Based Classification for Continuous Streaming Data , 2014, SGAI Conf..

[37]  Lui Sha,et al.  Dynamic clustering for acoustic target tracking in wireless sensor networks , 2003, IEEE Transactions on Mobile Computing.

[38]  Lei Liu,et al.  MobiMine: monitoring the stock market from a PDA , 2002, SKDD.

[39]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[40]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[41]  Mohamed Medhat Gaber,et al.  TRCM: A Methodology for Temporal Analysis of Evolving Concepts in Twitter , 2013, ICAISC.

[42]  Richard Granger,et al.  Beyond Incremental Processing: Tracking Concept Drift , 1986, AAAI.

[43]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[44]  Ryszard S. Michalski,et al.  Incremental learning with partial instance memory , 2002, Artif. Intell..

[45]  Alan C. Elliott,et al.  Statistical Analysis Quick Reference Guidebook: With SPSS Examples , 2006 .

[46]  Eric Gossett Discrete Mathematics with Proof , 2009 .

[47]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[49]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[50]  Lui Sha,et al.  Dynamic Clustering for Acoustic Target Tracking in Wireless Sensor Networks , 2004, IEEE Trans. Mob. Comput..

[51]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[52]  Atta Badii,et al.  Towards real-time feature tracking technique using adaptive micro-clusters , 2017 .

[53]  Jesús S. Aguilar-Ruiz,et al.  Incremental Rule Learning and Border Examples Selection from Numerical Data Streams , 2005, J. Univers. Comput. Sci..

[54]  João Gama,et al.  Very fast decision rules for classification in data streams , 2013, Data Mining and Knowledge Discovery.

[55]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[56]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[57]  João Gama,et al.  Learning Decision Rules from Data Streams , 2011, IJCAI.

[58]  Katarzyna Musial,et al.  Next challenges for adaptive learning systems , 2012, SKDD.