Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation

In the machine learning field, the performance of a classifier is usually measured in terms of prediction error. In most real-world problems, the error cannot be exactly calculated and it must be estimated. Therefore, it is important to choose an appropriate estimator of the error. This paper analyzes the statistical properties, bias and variance, of the k-fold cross-validation classification error estimator (k-cv). Our main contribution is a novel theoretical decomposition of the variance of the k-cv considering its sources of variance: sensitivity to changes in the training set and sensitivity to changes in the folds. The paper also compares the bias and variance of the estimator for different values of k. The experimental study has been performed in artificial domains because they allow the exact computation of the implied quantities and we can rigorously specify the conditions of experimentation. The experimentation has been performed for two classifiers (naive Bayes and nearest neighbor), different numbers of folds, sample sizes, and training sets coming from assorted probability distributions. We conclude by including some practical recommendation on the use of k-fold cross validation.

[1]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[2]  Yoshua Bengio,et al.  Bias in Estimating the Variance of K-Fold Cross-Validation , 2005 .

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[4]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[5]  Luc Devroye,et al.  Non-parametric Density Estimation , 1985 .

[6]  Peter J. F. Lucas,et al.  Restricted Bayesian Network Structure Learning , 2002, Probabilistic Graphical Models.

[7]  Pierre Duchesne,et al.  Statistical Modeling and Analysis for Complex Data Problems , 2010 .

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[10]  Luc Devroye,et al.  Distribution-free performance bounds with the resubstitution error estimate (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[11]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[12]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[15]  Ulisses Braga-Neto,et al.  Small-sample error estimation: mythology versus mathematics , 2005, SPIE Optics + Photonics.

[16]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[17]  G. M. El-Sayyad,et al.  On parametric density estimation , 1989 .

[18]  Fabio Gagliardi Cozman,et al.  Generating Random Bayesian Networks with Constraints on Induced Width , 2004, ECAI.

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[21]  Olivier Gascuel,et al.  Distribution-free performance bounds with the resubstitution error estimate , 1992, Pattern Recognit. Lett..

[22]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[23]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[24]  Edward R. Dougherty,et al.  Is cross-validation better than resubstitution for ranking genes? , 2004, Bioinform..

[25]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[26]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[27]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[28]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.