An effective pattern-based Bayesian classifier for evolving data stream

Abstract One of the hot topics in graph-based machine learning is to build Bayesian classifier from large-scale dataset. An advanced approach to Bayesian classification is based on exploited patterns. However, traditional pattern-based Bayesian classifiers cannot adapt to the evolving data stream environment. For that, an effective Pattern-based Bayesian classifier for Data Stream (PBDS) is proposed. First, a data-driven lazy learning strategy is employed to discover local frequent patterns for each test record. Furthermore, we propose a summary data structure for compact representation of data, and to find patterns more efficiently for each class. Greedy search and minimum description length combined with Bayesian network are applied to evaluating extracted patterns. Experimental studies on real-world and synthetic data streams show that PBDS outperforms most state-of-the-art data stream classifiers.

[1]  Philip S. Yu,et al.  Efficient algorithms for mining maximal high utility itemsets from data streams with different models , 2012, Expert Syst. Appl..

[2]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[3]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[5]  Kotagiri Ramamohanarao,et al.  Patterns Based Classifiers , 2007, World Wide Web.

[6]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[7]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[8]  José Francisco Martínez Trinidad,et al.  A survey of emerging patterns for supervised classification , 2012, Artificial Intelligence Review.

[9]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[10]  Mohammad Hadi Sadreddini,et al.  A sliding window based algorithm for frequent closed itemset mining over data streams , 2013, J. Syst. Softw..

[11]  Nick Cercone,et al.  Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model , 2012, Comput. Math. Appl..

[12]  Kotagiri Ramamohanarao,et al.  A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.

[13]  Dimitris Meretakis,et al.  Extending naïve Bayes classifiers using long itemsets , 1999, KDD '99.

[14]  Luca Cagliero,et al.  EnBay: A Novel Pattern-Based Bayesian Classifier , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  Keun Ho Ryu,et al.  Sliding window based weighted maximal frequent pattern mining over data streams , 2014, Expert Syst. Appl..

[16]  Jean Paul Barddal,et al.  A Survey on Ensemble Learning for Data Stream Classification , 2017, ACM Comput. Surv..

[17]  Luca Cagliero,et al.  RIB: A Robust Itemset-based Bayesian approach to classification , 2014, Knowl. Based Syst..

[18]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[19]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[20]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[21]  Zhihai Wang,et al.  A lazy associative classifier for time series , 2015, Intell. Data Anal..

[22]  Zhihai Wang,et al.  Online Ensemble Using Adaptive Windowing for Data Streams with Concept Drift , 2016, Int. J. Distributed Sens. Networks.

[23]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[24]  João Gama,et al.  Learning Decision Rules from Data Streams , 2011, IJCAI.

[25]  Suh-Yin Lee,et al.  DSM-FI: an efficient algorithm for mining frequent itemsets in data streams , 2008, Knowledge and Information Systems.

[26]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[27]  Geoff Holmes,et al.  Efficient data stream classification via probabilistic adaptive windows , 2013, SAC '13.

[28]  Aijun An,et al.  Mining top-k high utility patterns over data streams , 2014, Inf. Sci..

[29]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.