Série Scientifique Scientific Series No Unbiased Estimator of the Variance of K-fold Cross-validation No Unbiased Estimator of the Variance of K-fold Cross-validation

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the very commonly used K-fold cross-validation estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) unbiased estimator of the variance of K-fold cross-validation. The analysis that accompanies this result is based on the eigen-decomposition of the covariance matrix of errors, which has only three different eigenvalues corresponding to three degrees of freedom of the matrix and three components of the total variance. This analysis helps to better understand the nature of the problem and how it can make naive estimators (that don't take into account the error correlations due to the overlap between training and test sets) grossly underestimate variance. This is confirmed by numerical experiments in which the three components of the variance are compared when the difficulty of the learning problem and the number of folds are varied.

[1]  R. Cox,et al.  Journal of the Royal Statistical Society B , 1972 .

[2]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[3]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[4]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[5]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[6]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[7]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[8]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-one-Out Cross-Validation , 1997, COLT.

[9]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[10]  Martin Anthony,et al.  Cross-validation for binary classification by real-valued functions: theoretical analysis , 1998, COLT' 98.

[11]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[12]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[13]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[14]  BengioYoshua,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2004 .