Exact combinatorial bounds on the probability of overfitting for empirical risk minimization

Three general methods for obtaining exact bounds on the probability of overfitting are proposed within statistical learning theory: a method of generating and destroying sets, a recurrent method, and a blockwise method. Six particular cases are considered to illustrate the application of these methods. These are the following model sets of predictors: a pair of predictors, a layer of a Boolean cube, an interval of a Boolean cube, a monotonic chain, a unimodal chain, and a unit neighborhood of the best predictor. For the interval and the unimodal chain, the results of numerical experiments are presented that demonstrate the effects of splitting and similarity on the probability of overfitting.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Eric Bax Similar Classifiers and VC Error Bounds , 1997 .

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Joseph Sill Monotonicity and connectedness in learning systems , 1998 .

[5]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[6]  Robert Azencott,et al.  Distribution-Dependent Vapnik-Chervonenkis Bounds , 1999, EuroCOLT.

[7]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[8]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[9]  Ralf Herbrich,et al.  Algorithmic Luckiness , 2001, J. Mach. Learn. Res..

[10]  John Langford,et al.  Quantitatively tight sample complexity bounds , 2002 .

[11]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[12]  John Langford,et al.  Computable Shell Decomposition Bounds , 2000, J. Mach. Learn. Res..

[13]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[14]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[15]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[16]  K. Vorontsov Combinatorial probability and the tightness of generalization bounds , 2008, Pattern Recognition and Image Analysis.

[17]  Tight bounds for the probability of overfitting , 2009 .

[18]  K. Vorontsov Splitting and similarity phenomena in the sets of classifiers and their effect on the probability of overfitting , 2009, Pattern Recognition and Image Analysis.