Combining PAC-Bayesian and Generic Chaining Bounds

There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, and second, to combine the different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester (1998), which is interesting for randomized predictions, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand (see Talagrand, 1996), in a way that also takes into account the variance of the combined functions. We also show how this connects to Rademacher based bounds.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[3]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[4]  R. Dudley A course on empirical processes , 1984 .

[5]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[6]  G. Pisier Probabilistic methods in the geometry of Banach spaces , 1986 .

[7]  M. Talagrand The Glivenko-Cantelli Problem , 1987 .

[8]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[9]  A. Dembo,et al.  Large Deviation Techniques and Applications. , 1994 .

[10]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[11]  M. Talagrand Majorizing measures: the generic chaining , 1996 .

[12]  G. Lugosi,et al.  On Concentration-of-Measure Inequalities , 1998 .

[13]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[14]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[15]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[16]  S. Geer Empirical Processes in M-Estimation , 2000 .

[17]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[18]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[19]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[20]  M. Talagrand Majorizing measures without measures , 2001 .

[21]  D. Panchenko SYMMETRIZATION APPROACH TO CONCENTRATION INEQUALITIES FOR EMPIRICAL PROCESSES , 2003, math/0405354.

[22]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[23]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[24]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[25]  Jean-Yves Audibert A BETTER VARIANCE CONTROL FOR PAC-BAYESIAN CLASSIFICATION , 2004 .

[26]  Peter L. Bartlett,et al.  Local Complexities for Empirical Risk Minimization , 2004, COLT.

[27]  O. Catoni A PAC-Bayesian approach to adaptive classification , 2004 .

[28]  Jean-Yves Audibert Aggregated estimators and empirical complexity for least square regression , 2004 .

[29]  Jean-Yves Audibert DATA-DEPENDENT GENERALIZATION ERROR BOUNDS FOR (NOISY) CLASSIFICATION: A PAC-BAYESIAN APPROACH , 2004 .

[30]  O. Bousquet THEORY OF CLASSIFICATION: A SURVEY OF RECENT ADVANCES , 2004 .

[31]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[32]  M. Talagrand The Generic chaining : upper and lower bounds of stochastic processes , 2005 .

[33]  G. Lugosi Concentration-of-measure inequalities Lecture notes by Gábor Lugosi , 2005 .

[34]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[35]  P. Bartlett,et al.  Empirical minimization , 2006 .

[36]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.