Selecting a classification method by cross-validation

If we lack relevant problem-specific knowledge, cross-validation methods may be used to select a classification method empirically. We examine this idea here to show in what senses cross-validation does and does not solve the selection problem. As illustrated empirically, cross-validation may lead to higher average performance than application of any single classification strategy, and it also cuts the risk of poor performance. On the other hand, cross-validation is no more or less a form of bias than simpler strategies, and applying it appropriately ultimately depends in the same way on prior knowledge. In fact, cross-validation may be seen as a way of applying partial information about the applicability of alternative classification strategies.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Cullen Schaffer Overfitting avoidance as bias , 2004, Machine Learning.

[3]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[4]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[5]  M. Stone Asymptotics for and against cross-validation , 1977 .

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[8]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[9]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[11]  Wray L. Buntine Classifiers: A Theoretical and Empirical Study , 1991, IJCAI.

[12]  D. Wolpert On Overfitting Avoidance as Bias , 1993 .

[13]  David H. Wolpert,et al.  On the Connection between In-sample Testing and Generalization Error , 1992, Complex Syst..

[14]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[15]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[16]  Douglas H. Fisher,et al.  An Empirical Comparison of ID3 and Back-propagation , 1989, IJCAI.

[17]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[18]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[19]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[20]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[21]  Douglas H. Fisher,et al.  Concept Simplification and Prediction Accuracy , 1988, ML.

[22]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[23]  L. N. Kanal,et al.  Uncertainty in Artificial Intelligence 5 , 1990 .

[24]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[25]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .