Decision Tree for Dynamic and Uncertain Data Streams

Current research on data stream classification mainly focuses on certain data, in which precise and definite value is usually assumed. However, data with uncertainty is quite natural in real-world application due to various causes, including imprecise measurement, repeated sampling and network errors. In this paper, we focus on uncertain data stream classification. Based on CVFDT and DTU, we propose our UCVFDT (Uncertainty-handling and Concept-adapting Very Fast Decision Tree) algorithm, which not only maintains the ability of CVFDT to cope with concept drift with high speed, but also adds the ability to handle data with uncertain attribute. Experimental study shows that the proposed UCVFDT algorithm is efficient in classifying dynamic data stream with uncertain numerical attribute and it is computationally efficient.

[1]  Xiaoming Jin,et al.  An automatic construction and organization strategy for ensemble learning on data streams , 2006, SGMD.

[2]  Sunil Prabhakar,et al.  A Rule-Based Classification Algorithm for Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Xue Li,et al.  Classifier Ensemble for Uncertain Data Stream Classification , 2010, PAKDD.

[4]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[5]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[6]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[7]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[8]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[9]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10]  Xindong Wu,et al.  Dynamic classifier selection for effective mining from noisy data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[11]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[14]  João Gama,et al.  Learning decision trees from dynamic data streams , 2005, SAC '05.

[15]  Biao Qin,et al.  DTU: A Decision Tree for Uncertain Data , 2009, PAKDD.

[16]  Reynold Cheng,et al.  Naive Bayes Classification of Uncertain Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.