Cross-Validated C4.5: Using Error Estimation for Automatic Parameter Selection

Machine learning algorithms for supervised learning are in wide use. An important issue in the use of these algorithms is how to set the parameters of the algorithm. While the default parameter values may be appropriate for a wide variety of tasks, they are not necessarily optimal for a given task. In this paper, we investigate the use of cross-validation to select parameters for the C4.5 decision tree learning algorithm. Experimental results on five datasets show that when cross-validation is applied to selecting an important parameter for C4.5, the accuracy of the induced trees on independent test sets is generally higher than the accuracy when using the default parameter value.