Fast C4.5

C4.5 is a well-known machine learning algorithm used extensively, however, its runtime performance is sacrificed for the consideration of the limited main memory at that time. We present a fast implementation of C4.5 algorithm, named FC4.5(Fast C4.5). It organizes novel data structures, uses the indirect bucket-sort combined with the bit-parallel technique, and confines the binary-search of the cutoff within the narrowest range. The combination of these techniques enables FC4.5 greatly accelerates the tree construction process of C4.5 algorithm. Experiments show that FC4.5 can build the same decision tree as C4.5 (Release 8) system and the runtime performance gain up to 5.8 times. Besides, FC4.5 also achieves a good scalability on different kinds of datasets.

[1]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[2]  Salvatore Ruggieri,et al.  Efficient C4.5 , 2002, IEEE Trans. Knowl. Data Eng..

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[5]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[6]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[7]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Salvatore Ruggieri,et al.  YaDT: yet another decision tree builder , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.