Efficient building algorithms of decision tree for uniformly distributed uncertain data

Developing algorithms for uncertain data is one of the most active themes in data mining community. A number of different decision tree classifiers have been studied in order to deal with uncertain data. This paper extends these works. In this paper, we develop a tree-pruning algorithm using sum of the tuples fractions based on probability theory. By pruning, we find that the accuracy of the classifier is improved and the efficiency of building the decision tree is also improved. Besides, we find that under the context of uniformly distribution, increasing the sampling density of the uncertain attribute value can make little contribution to improve the accuracy, but is computationally more costly. So we propose a new method of sampling. Using this sampling method, the execution time of building the decision tree is greatly decreased.

[1]  Sunil Prabhakar,et al.  A Rule-Based Classification Algorithm for Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[2]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  Biao Qin,et al.  DTU: A Decision Tree for Uncertain Data , 2009, PAKDD.