论文信息 - Reliable Accuracy Estimates from k-Fold Cross Validation

Reliable Accuracy Estimates from k-Fold Cross Validation

It is popular to evaluate the performance of classification algorithms by <italic>k</italic>-fold cross validation. A reliable accuracy estimate will have a relatively small variance, and several studies therefore suggested to repeatedly perform <italic>k</italic>-fold cross validation. Most of them did not consider the correlation among the replications of <italic>k</italic>-fold cross validation, and hence the variance could be underestimated. The purpose of this study is to explore whether <italic>k</italic>-fold cross validation should be repeatedly performed for obtaining reliable accuracy estimates. The dependency relationships between the predictions of the same instance in two replications of <italic>k</italic>-fold cross validation are first analyzed for <italic>k</italic>-nearest neighbors with <inline-formula><tex-math notation="LaTeX">$k= 1$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="wong-ieq1-2912815.gif"/></alternatives></inline-formula>. Then, statistical methods are proposed to test the strength of the dependency level between the accuracy estimates resulting from two replications of <italic>k</italic>-fold cross validation. The experimental results on 20 data sets show that the accuracy estimates obtained from various replications of <italic>k</italic>-fold cross validation are generally highly correlated, and the correlation will be higher as the number of folds increases. The <italic>k</italic>-fold cross validation with a large number of folds and a small number of replications should be adopted for performance evaluation of classification algorithms.

Tzu-Tsung Wong | Po-Yang Yeh | Tzu-Tsung Wong | P. Yeh

[1] Ethem Alpaydın,et al. Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[2] Yoshua Bengio,et al. No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[3] Yu Wang,et al. Measure for data partitioning in m × 2 cross-validation , 2015, Pattern Recognit. Lett..

[4] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[5] Hendrik Blockeel,et al. On estimating model accuracy with repeated cross-validation , 2012 .

[6] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[7] Ian H. Witten,et al. Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[8] Tzu-Tsung Wong,et al. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[9] Francisco Herrera,et al. Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[10] Tzu-Tsung Wong,et al. Dependency Analysis of Accuracy Estimates in k-Fold Cross Validation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[11] L. Breiman. Heuristics of instability and stabilization in model selection , 1996 .

[12] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13] Wang Yu,et al. Blocked 3×2 Cross-Validated t-Test for Comparing Supervised Classification Learning Algorithms , 2014, Neural Comput..

[14] José Antonio Lozano,et al. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Ji-Hyun Kim,et al. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[16] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[17] Remco R. Bouckaert,et al. Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.