Parallel learning using decision trees: a novel approach

Decision trees are one of the most effective and widely used induction methods that have received a great deal of attention over the past twenty years. When decision tree induction algorithms used with uncertain rather than deterministic data, the result is complete tree, which can classify most of the unseen samples correctly. This tree would be pruned in order to reduce its classification error and over-fitting. Recently, parallel decision tree researches concentrated on dealing with large databases in reasonable amount of time. In this paper we present new parallel learning methods that are able to induce a decision tree from some overlapping partitioned training set. Our methods are based on combination of multiple induction methods; each one is running on different processor. These methods have been developed based on Kramer and fuzzy mode to control and combine the result of learning methods in order to generate the final tree. Experimental results show that if the attributes and classes in training set have uniform distribution and the size of training set are not too small, these methods result statistically lower error rate in comparison to existing methods.

[1]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[2]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Steven L. Salzberg,et al.  On growing better decision trees from data , 1996 .

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Chao-Tung Yang,et al.  Decision tree construction for data mining on grid computing , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[8]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[9]  Domenico Talia,et al.  Managing heterogeneous resources in data mining applications on grids using XML-based metadata , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[10]  Michael K. Buckland,et al.  Annual Review of Information Science and Technology , 2006, J. Documentation.

[11]  J. R. Quinlan Constructing Decision Trees , 1993 .

[12]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[13]  Fabrizio Silvestri,et al.  Scheduling High Performance Data Mining Tasks on a Data Grid Environment , 2002, Euro-Par.