Best-Fit Learning Curve Model for the C4.5 Algorithm

Background: In the area of artificial learners, not much research on the question of an appropriate description of artificial learner's (empirical) performance has been conducted. The optimal solution of describing a learning problem would be a functional dependency between the data, the learning algorithm's internal specifics and its performance. Unfortunately, a general, restrictions-free theory on performance of arbitrary artificial learners has not been developed yet.Objective: The objective of this paper is to investigate which function is most appropriately describing the learning curve produced by C4.5 algorithm.Methods: The J48 implementation of the C4.5 algorithm was applied to datasets (n=121) from publicly available repositories (e.g. UCI) in step wise k-fold cross-validation. First, four different functions (power, linear, logarithmic, exponential) were fit to the measured error rates. Where the fit was statistically significant (n=86), we measured the average mean squared error rate for each function and its rank. The dependent samples T-test was performed to test whether the differences between mean squared error are significantly different, and Wilcoxon's signed rank test was used to test whether the differences between ranks are significant.Results: The decision trees error rate can be successfully modeled by an exponential function. In a total of 86 datasets, exponential function was a better descriptor of error rate function in 64 of 86 cases, power was best in 13, logarithmic in 3, and linear in 6 out of 86 cases. Average mean squared error across all datasets was 0.052954 for exponential function, and was significantly different at P=0.001 from power and at P=0.000 from linear function. The results also show that exponential function's rank is significantly different at any reasonable threshold (P=0.000) from the rank of any other model.Conclusion: Our findings are consistent with tests performed in the area of human cognitive performance, e.g. with works by Heathcote et al. (2000), who were observing that the exponential function is best describing an individual learner. In our case we did observe an individual learner (C4.5 algorithm) at different tasks. The work can be used to forecast and model the future performance of C4.5 when not all data have been used or there is a need to obtain more data for better accuracy.

[1]  John R. Anderson,et al.  Reflections of the Environment in Memory Form of the Memory Functions , 2022 .

[2]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[3]  George Argyrous,et al.  Statistics for Research with a Guide to SPSS Statistics for Research with a Guide to SPSS George Argyrous Sage Publications £29.99 608pp isbn: 9781849205955 1849205957 [Formula: see text]. , 2011, Nurse researcher.

[4]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[5]  Scott D. Brown,et al.  The power law repealed: The case for an exponential law of practice , 2000, Psychonomic bulletin & review.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Kestutis Ducinskas,et al.  Expected Bayes Error Rate in Supervised Classification of Spatial Gaussian Data , 2011, Informatica.

[8]  Gintautas Dzemyda,et al.  Large-Scale Data Analysis Using Heuristic Methods , 2011, Informatica.

[9]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[10]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[11]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[12]  Mark Last,et al.  Predicting and Optimizing Classifier Utility with the Power Law , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Matjaz B. Juric,et al.  Assessment of Classification Models with Small Amounts of Data , 2007, Informatica.

[15]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[16]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[17]  Richard B. Anderson The power law as an emergent property , 2001, Memory & cognition.

[18]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[19]  Sameer Singh Modeling Performance of Different Classification Methods : Deviation from the Power Law , 2005 .

[20]  H. Abdi The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[21]  Huan Liu,et al.  Modelling Classification Performance for Large Data Sets , 2001, WAIM.

[22]  Douglas H. Fisher,et al.  Modeling decision tree performance with the power law , 1999, AISTATS.

[23]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[24]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .