Tuning the Distribution Dependent Prior in the PAC-Bayes Framework based on Empirical Data

In this paper we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution. In particular, following Catoni [1], we refine some recent generalisation bounds on the risk of the Gibbs Classifier, when the prior is defined in terms of the data generating distribution, and the posterior is defined in terms of the observed one. Moreover we show that the prior and the posterior distributions can be tuned based on the observed samples without worsening the convergence rate of the bounds and with a marginal impact on their constants.

[1]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[2]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[5]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[6]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[7]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[8]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[9]  Daniel Berend,et al.  Consistency of weighted majority votes , 2013, NIPS.

[10]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[11]  Shmuel Nitzan,et al.  Optimal Decision Rules in Uncertain Dichotomous Choice Situations , 1982 .

[12]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[13]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[16]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[17]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[18]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.