PAC-Bayesian Theory

The PAC-Bayesian framework is a frequentist approach to machine learning which encodes learner bias as a “prior probability” over hypotheses. This chapter reviews basic PAC-Bayesian theory, including Catoni’s basic inequality and Catoni’s localization theorem.

[1]  David A. McAllester,et al.  Generalization bounds and consistency for latent-structural probit and ramp loss , 2011, MLSLP.

[2]  John Shawe-Taylor,et al.  Distribution-Dependent PAC-Bayes Priors , 2010, ALT.

[3]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[6]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[7]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[8]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[9]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[10]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[11]  John Langford,et al.  Microchoice Bounds and Self Bounding Learning Algorithms , 2003, Machine Learning.

[12]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[13]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[14]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[15]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..