论文信息 - PAC-Bayesian Bounds based on the Rényi Divergence

PAC-Bayesian Bounds based on the Rényi Divergence

We propose a simplified proof process for PAC-Bayesian generalization bounds, that allows to divide the proof in four successive inequalities, easing the "customization" of PAC-Bayesian theorems. We also propose a family of PAC-Bayesian bounds based on the Renyi divergence between the prior and posterior distributions, whereas most PAC-Bayesian bounds are based on the Kullback-Leibler divergence. Finally, we present an empirical evaluation of the tightness of each inequality of the simplified proof, for both the classical PAC-Bayesian bounds and those based on the Renyi divergence.

[1] Tommi S. Jaakkola,et al. Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees , 2014, AISTATS.

[2] Gilles Blanchard,et al. Occam's Hammer , 2006, COLT.

[3] Pierre Alquier,et al. On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[4] John Shawe-Taylor,et al. Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[5] John Shawe-Taylor,et al. PAC-Bayes & Margins , 2002, NIPS.

[6] David A. McAllester. PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[7] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[8] Naftali Tishby,et al. PAC-Bayesian Analysis of Co-clustering and Beyond , 2010, J. Mach. Learn. Res..

[9] John Shawe-Taylor,et al. PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[10] Andreas Maurer,et al. A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[11] G. Crooks. On Measures of Entropy and Information , 2015 .

[12] Arindam Banerjee,et al. On Bayesian bounds , 2006, ICML.

[13] Christoph H. Lampert,et al. Lifelong Learning with Non-i.i.d. Tasks , 2015, NIPS.

[14] Neri Merhav,et al. Information-theoretic applications of the logarithmic probability comparison bound , 2014, 2015 IEEE International Symposium on Information Theory (ISIT).

[15] Shiliang Sun,et al. PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[16] Thore Graepel,et al. A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[17] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[18] Ran El-Yaniv,et al. Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms , 2004, J. Artif. Intell. Res..

[19] François Laviolette,et al. Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[20] François Laviolette,et al. PAC-Bayesian Theory for Transductive Learning , 2014, AISTATS.

[21] Paul Dupuis,et al. Robust Bounds on Risk-Sensitive Functionals via Rényi Divergence , 2013, SIAM/ASA J. Uncertain. Quantification.

[22] A. Rényi. On Measures of Entropy and Information , 1961 .

[23] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[25] François Laviolette,et al. PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[26] O. Catoni. PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[27] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.

[28] Matthias W. Seeger,et al. Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[29] D. Anderson,et al. Algorithms for minimization without derivatives , 1974 .

[30] David A. McAllester. A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.