A BETTER VARIANCE CONTROL FOR PAC-BAYESIAN CLASSIFICATION

The common method to understand and improve classification rules is to prove bounds on the generalization error. Here we provide localized data-based PAC-bounds for the difference between the risk of any two randomized estimators. We derive from these bounds two types of algorithms: the first one uses combinatorial technics and is related to compression schemes whereas the second one involves Gibbs estimators. We also recover some of the results of the Vapnik-Chervonenkis theory and improve them by taking into account the variance term measured by the pseudo-distance (f1, f2) 7→ P[f1(X) 6= f2(X)]. Finally, we present different ways of localizing the results in order to improve the bounds and make them less dependent on the choice of the prior. For some classes of functions (such as VC-classes), this will lead to gain a logarithmic factor without using the chaining technique (see [1] for more details).

[1]  Luc Devroye,et al.  Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..

[2]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[3]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[4]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[5]  G. Lugosi,et al.  On Concentration-of-Measure Inequalities , 1998 .

[6]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[7]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[8]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[9]  M. Kohler Inequalities for uniform deviations of averages from expectations with applications to nonparametric regression , 2000 .

[10]  Dmitry Panchenko,et al.  Some Local Measures of Complexity of Convex Hulls and Generalization Bounds , 2002, COLT.

[11]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[12]  O. Bousquet Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .

[13]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[14]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[15]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[16]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[17]  O. Catoni A PAC-Bayesian approach to adaptive classification , 2004 .