Concept Drifting Detection on Noisy Streaming Data in Random Ensemble Decision Trees

Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in virtue of ensemble classification models, few of them could adapt to the detection on the different types of concept drifts from noisy streaming data in a light demand on overheads of time and space. Motivated by this, a new classification algorithm for Concept drifting Detection based on an ensembling model of Random Decision Trees (called CDRDT) is proposed in this paper. Extensive studies with synthetic and real streaming data demonstrate that in comparison to several representative classification algorithms for concept drifting data streams, CDRDT not only could effectively and efficiently detect the potential concept changes in the noisy data streams, but also performs much better on the abilities of runtime and space with an improvement in predictive accuracy. Thus, our proposed algorithm provides a significant reference to the classification for concept drifting data streams with noise in a light weight way.

[1]  David B. Skillicorn,et al.  Classifying Evolving Data Streams Using Dynamic Streaming Random Forests , 2008, DEXA.

[2]  Johannes Gehrke,et al.  BOAT—optimistic decision tree construction , 1999, SIGMOD '99.

[3]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Xindong Wu,et al.  Combining proactive and reactive predictions for data streams , 2005, KDD '05.

[5]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[6]  Xindong Wu,et al.  A Semi-Random Multiple Decision-Tree Algorithm for Mining Data Streams , 2007, Journal of Computer Science and Technology.

[7]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[8]  Carlo Zaniolo,et al.  An adaptive learning approach for noisy data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[10]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[11]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[12]  Wei Fan StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams , 2004, VLDB.

[13]  Xindong Wu,et al.  Parameter Estimdation in Semi-Random Decision Tree Ensembling on Streaming Data , 2009, PAKDD.

[14]  A. Campbell,et al.  Progress in Artificial Intelligence , 1995, Lecture Notes in Computer Science.

[15]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[16]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[17]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[18]  João Gama,et al.  Adaptation to Drifting Concepts , 2003, EPIA.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[21]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[22]  Xindong Wu,et al.  Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees , 2008, ADMA.