A New Method for Data Stream Mining Based on the Misclassification Error

In this paper, a new method for constructing decision trees for stream data is proposed. First a new splitting criterion based on the misclassification error is derived. A theorem is proven showing that the best attribute computed in considered node according to the available data sample is the same, with some high probability, as the attribute derived from the whole infinite data stream. Next this result is combined with the splitting criterion based on the Gini index. It is shown that such combination provides the highest accuracy among all studied algorithms.

[1]  E. Lughofer,et al.  Evolving fuzzy classifiers using different model architectures , 2008, Fuzzy Sets Syst..

[2]  Ludmila I. Kuncheva,et al.  PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Plamen P. Angelov,et al.  PANFIS: A Novel Incremental Learning Machine , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[5]  Dejan Dovzan,et al.  Solving the sales prediction problem with fuzzy evolving methods , 2012, 2012 IEEE Congress on Evolutionary Computation.

[6]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[7]  Plamen Angelov,et al.  Evolving Intelligent Systems: Methodology and Applications , 2010 .

[8]  Plamen P. Angelov,et al.  Handling drifts and shifts in on-line data streams with evolving fuzzy systems , 2011, Appl. Soft Comput..

[9]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[10]  João Gama,et al.  A survey on learning from data streams: current and future trends , 2012, Progress in Artificial Intelligence.

[11]  Philip S. Yu,et al.  Decision tree evolution using limited number of labeled data items from drifting data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[12]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Albert Bifet,et al.  DATA STREAM MINING A Practical Approach , 2009 .

[15]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the Gaussian Approximation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[17]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[18]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[19]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the McDiarmid's Bound , 2013, IEEE Transactions on Knowledge and Data Engineering.

[20]  Franco Turini,et al.  Stream mining: a novel architecture for ensemble-based classification , 2011, Knowledge and Information Systems.

[21]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[22]  Geoff Holmes,et al.  New Options for Hoeffding Trees , 2007, Australian Conference on Artificial Intelligence.

[23]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Robi Polikar,et al.  COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Nikola Kasabov,et al.  Evolving computational intelligence systems , 2005 .

[26]  Mykola Pechenizkiy,et al.  Dealing With Concept Drifts in Process Mining , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[28]  Ernestina Menasalvas Ruiz,et al.  Mining Recurring Concepts in a Dynamic Feature Space , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Michal Wozniak,et al.  A hybrid decision tree training method using data streams , 2011, Knowledge and Information Systems.

[30]  Jing Liu,et al.  Ambiguous decision trees for mining concept-drifting data streams , 2009, Pattern Recognit. Lett..

[31]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Nikola K. Kasabov,et al.  Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[33]  Piotr Duda,et al.  The CART decision tree for mining data streams , 2014, Inf. Sci..

[34]  O. Kardaun,et al.  Classical Methods of Statistics , 2005 .

[35]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .