An a Priori Exponential Tail Bound for k-Folds Cross-Validation

We consider a priori generalization bounds developed in terms of cross-validation estimates and the stability of learners. In particular, we first derive an exponential Efron-Stein type tail inequality for the concentration of a general function of n independent random variables. Next, under some reasonable notion of stability, we use this exponential tail bound to analyze the concentration of the k-fold cross-validation (KFCV) estimate around the true risk of a hypothesis generated by a general learning rule. While the accumulated literature has often attributed this concentration to the bias and variance of the estimator, our bound attributes this concentration to the stability of the learning rule and the number of folds k. This insight raises valid concerns related to the practical use of KFCV and suggests research directions to obtain reliable empirical estimates of the actual risk.

[1]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[2]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[3]  W. Rogers,et al.  A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .

[4]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[5]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[6]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[7]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[8]  J. Steele An Efron-Stein inequality for nonsymmetric statistics , 1986 .

[9]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[10]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[11]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[12]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[13]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[16]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[17]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[18]  P. MassartLedoux Concentration Inequalities Using the Entropy Method , 2002 .

[19]  Gábor Lugosi,et al.  Concentration Inequalities , 2008, COLT.

[20]  S. Boucheron,et al.  Moment inequalities for functions of independent random variables , 2005, math/0503651.

[21]  T. Poggio,et al.  STABILITY RESULTS IN LEARNING THEORY , 2005 .

[22]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[23]  M. Cornec,et al.  Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser , 2010, 1011.0096.

[24]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[25]  Sergei Vassilvitskii,et al.  Cross-Validation and Mean-Square Stability , 2011, ICS.

[26]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[27]  András György,et al.  Fast Cross-Validation for Incremental Learning , 2015, IJCAI.

[28]  Alain Celisse,et al.  Stability revisited: new generalisation bounds for the Leave-one-Out , 2016, 1608.06412.