Reliable Accuracy Estimates from k-Fold Cross Validation

It is popular to evaluate the performance of classification algorithms by <italic>k</italic>-fold cross validation. A reliable accuracy estimate will have a relatively small variance, and several studies therefore suggested to repeatedly perform <italic>k</italic>-fold cross validation. Most of them did not consider the correlation among the replications of <italic>k</italic>-fold cross validation, and hence the variance could be underestimated. The purpose of this study is to explore whether <italic>k</italic>-fold cross validation should be repeatedly performed for obtaining reliable accuracy estimates. The dependency relationships between the predictions of the same instance in two replications of <italic>k</italic>-fold cross validation are first analyzed for <italic>k</italic>-nearest neighbors with <inline-formula><tex-math notation="LaTeX">$k= 1$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="wong-ieq1-2912815.gif"/></alternatives></inline-formula>. Then, statistical methods are proposed to test the strength of the dependency level between the accuracy estimates resulting from two replications of <italic>k</italic>-fold cross validation. The experimental results on 20 data sets show that the accuracy estimates obtained from various replications of <italic>k</italic>-fold cross validation are generally highly correlated, and the correlation will be higher as the number of folds increases. The <italic>k</italic>-fold cross validation with a large number of folds and a small number of replications should be adopted for performance evaluation of classification algorithms.

[1]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[2]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[3]  Yu Wang,et al.  Measure for data partitioning in m × 2 cross-validation , 2015, Pattern Recognit. Lett..

[4]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[5]  Hendrik Blockeel,et al.  On estimating model accuracy with repeated cross-validation , 2012 .

[6]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[7]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[8]  Tzu-Tsung Wong,et al.  Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[9]  Francisco Herrera,et al.  Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Tzu-Tsung Wong,et al.  Dependency Analysis of Accuracy Estimates in k-Fold Cross Validation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[11]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Wang Yu,et al.  Blocked 3×2 Cross-Validated t-Test for Comparing Supervised Classification Learning Algorithms , 2014, Neural Comput..

[14]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[16]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[17]  Remco R. Bouckaert,et al.  Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.