A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

We review accuracy estimation methods and compare the two most common methods crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical re cults in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment--over half a million runs of C4.5 and a Naive-Bayes algorithm--to estimate the effects of different parameters on these algrithms on real-world datasets. For crossvalidation we vary the number of folds and whether the folds are stratified or not, for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, The best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[3]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sholom M. Weiss,et al.  Small Sample Error Rate Estimation for k-NN Classifiers , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[9]  Ping Zhang On the Distributional Properties of Model Selection Criteria , 1992 .

[10]  Charles Elkan,et al.  Estimating the Accuracy of Learned Concepts , 1993, IJCAI.

[11]  M. Kauffman,et al.  Off-Training Set Error and A Priori Distinctions Between Learning Algorithms , 1994 .

[12]  Sholom M. Weiss,et al.  Decision Tree Pruning: Biased or Optimal? , 1994, AAAI.

[13]  Ron Kohavi,et al.  MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[14]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[15]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[16]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[17]  Nils J. Nilsson,et al.  MLC++, A Machine Learning Library in C++. , 1995 .