Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

ABSTRACT We derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we show that the worst-case error of this estimate is not much worse that of training error estimate see Kearns M, Ron D. [Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 1999;11:1427–1453]. General loss functions and class of predictors with finite VC-dimension are considered. Our focus is on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rates of convergence. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[3]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[5]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[6]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[7]  Philip J. McCarthy,et al.  The Use of Balanced Half-Sample Replication in Cross-Validation Studies , 1976 .

[8]  M. Stone Asymptotics for and against cross-validation , 1977 .

[9]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[10]  Luc Devroye,et al.  Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.

[11]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[12]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[13]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[14]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[15]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[16]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[17]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[18]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[19]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[21]  Sean B. Holden PAC-like upper bounds for the sample complexity of leave-one-out cross-validation , 1996, COLT '96.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[24]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  André Elisseeff,et al.  Algorithmic Stability and Generalization Performance , 2000, NIPS.

[27]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[28]  Tong Zhang,et al.  A Leave-One-out Cross Validation Bound for Kernel Methods with Applications in Learning , 2001, COLT/EuroCOLT.

[29]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[30]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[31]  S. Kutin Extensions to McDiarmid's inequality when dierences are bounded with high probability , 2002 .

[32]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[33]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[34]  Annette M. Molinaro,et al.  Loss-based estimation with cross-validation: applications to microarray data analysis , 2003, SKDD.

[35]  Sandrine Dudoit,et al.  Asymptotics of Cross-Validated Risk Estimation in Model Selection and Performance Assessment , 2003 .

[36]  Dana Ron,et al.  An Experimental and Theoretical Comparison of Model Selection Methods , 1995, COLT '95.

[37]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[38]  George Hripcsak,et al.  Analysis of Variance of Cross-Validation Estimators of the Generalization Error , 2005, J. Mach. Learn. Res..

[39]  S. Dudoit,et al.  Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[40]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[41]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[42]  G. Gnecco,et al.  Approximation Error Bounds via Rademacher's Complexity , 2008 .

[43]  Sylvain Arlot Model selection by resampling penalization , 2007, 0906.3124.

[44]  Vladimir Koltchinskii,et al.  Rademacher Complexities and Bounding the Excess Risk in Active Learning , 2010, J. Mach. Learn. Res..

[45]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[46]  Susan A Murphy,et al.  Adaptive Confidence Intervals for the Test Error in Classification , 2011, Journal of the American Statistical Association.