论文信息 - On estimating model accuracy with repeated cross-validation

On estimating model accuracy with repeated cross-validation

Evaluation of predictive models is a ubiquitous task in machine learning and data mining. Cross-validation is often used as a means for evaluating models. There appears to be some confusion among researchers, however, about best practices for cross-validation, and about the interpretation of cross-validation results. In particular, repeated cross-validation is often advocated, and so is the reporting of standard deviations, confidence intervals, or an indication of ”significance”. In this paper, we argue that, under many practical circumstances, when the goal of the experiments is to see how well the model returned by a learner will perform in practice in a particular domain, repeated cross-validation is not useful, and the reporting of confidence intervals or significance is misleading. Our arguments are supported by experimental results.

Hendrik Blockeel | Gitte Vanwinckelen | H. Blockeel | Gitte Vanwinckelen

[1] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[2] Edward R. Dougherty,et al. Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[3] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[4] Cullen Schaffer,et al. Selecting a classification method by cross-validation , 1993, Machine Learning.

[5] Remco R. Bouckaert,et al. Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.

[6] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[7] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[8] Ethem Alpaydın,et al. Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..