PAC-Bayesian Bounds based on the Rényi Divergence

We propose a simplified proof process for PAC-Bayesian generalization bounds, that allows to divide the proof in four successive inequalities, easing the "customization" of PAC-Bayesian theorems. We also propose a family of PAC-Bayesian bounds based on the Renyi divergence between the prior and posterior distributions, whereas most PAC-Bayesian bounds are based on the Kullback-Leibler divergence. Finally, we present an empirical evaluation of the tightness of each inequality of the simplified proof, for both the classical PAC-Bayesian bounds and those based on the Renyi divergence.

[1]  Tommi S. Jaakkola,et al.  Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees , 2014, AISTATS.

[2]  Gilles Blanchard,et al.  Occam's Hammer , 2006, COLT.

[3]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[4]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[5]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[6]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Naftali Tishby,et al.  PAC-Bayesian Analysis of Co-clustering and Beyond , 2010, J. Mach. Learn. Res..

[9]  John Shawe-Taylor,et al.  PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[10]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[11]  G. Crooks On Measures of Entropy and Information , 2015 .

[12]  Arindam Banerjee,et al.  On Bayesian bounds , 2006, ICML.

[13]  Christoph H. Lampert,et al.  Lifelong Learning with Non-i.i.d. Tasks , 2015, NIPS.

[14]  Neri Merhav,et al.  Information-theoretic applications of the logarithmic probability comparison bound , 2014, 2015 IEEE International Symposium on Information Theory (ISIT).

[15]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[16]  Thore Graepel,et al.  A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[17]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[18]  Ran El-Yaniv,et al.  Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms , 2004, J. Artif. Intell. Res..

[19]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[20]  François Laviolette,et al.  PAC-Bayesian Theory for Transductive Learning , 2014, AISTATS.

[21]  Paul Dupuis,et al.  Robust Bounds on Risk-Sensitive Functionals via Rényi Divergence , 2013, SIAM/ASA J. Uncertain. Quantification.

[22]  A. Rényi On Measures of Entropy and Information , 1961 .

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[26]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[27]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[28]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[29]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[30]  David A. McAllester A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.