PAC-Bayes Learning Bounds for Sample-Dependent Priors

We present a series of new PAC-Bayes learning guarantees for randomized algorithms with sample-dependent priors. Our most general bounds make no assumption on the priors and are given in terms of certain covering numbers under the infinite-Rényi divergence and the `1 distance. We show how to use these general bounds to derive leaning bounds in the setting where the sample-dependent priors obey an infinite-Rényi divergence or `1-distance sensitivity condition. We also provide a flexible framework for computing PAC-Bayes bounds, under certain stability assumptions on the sample-dependent priors, and show how to use this framework to give more refined bounds when the priors satisfy an infinite-Rényi divergence sensitivity condition.

[1]  John Shawe-Taylor,et al.  A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[2]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[3]  John Langford,et al.  (Not) Bounding the True Error , 2001, NIPS.

[4]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[5]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[6]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[7]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[8]  T. Poggio,et al.  STABILITY RESULTS IN LEARNING THEORY , 2005 .

[9]  John Shawe-Taylor,et al.  Tighter PAC-Bayes Bounds , 2006, NIPS.

[10]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[11]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[12]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[13]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[14]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[15]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[16]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[17]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[18]  S. Dragomir,et al.  Bounds for Kullback-Leibler divergence , 2016 .

[19]  Aaron Roth,et al.  Max-Information, Differential Privacy, and Post-selection Hypothesis Testing , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[20]  Ben London,et al.  A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent , 2017, NIPS.

[21]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[22]  Gintare Karolina Dziugaite,et al.  Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors , 2017, ICML.

[23]  Gintare Karolina Dziugaite,et al.  Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy , 2017, NeurIPS.

[24]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[25]  Jan Vondrák,et al.  Generalization Bounds for Uniformly Stable Algorithms , 2018, NeurIPS.

[26]  Shiliang Sun,et al.  PAC-Bayes bounds for stable algorithms with instance-dependent priors , 2018, NeurIPS.

[27]  Haipeng Luo,et al.  Hypothesis Set Stability and Generalization , 2019, NeurIPS.

[28]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[29]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[30]  Jan Vondrák,et al.  High probability generalization bounds for uniformly stable algorithms with nearly optimal rate , 2019, COLT.

[31]  Gintare Karolina Dziugaite,et al.  Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[32]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[33]  Daniel M. Roy,et al.  Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms , 2020, NeurIPS.

[34]  Ilja Kuzborskij,et al.  PAC-Bayes Analysis Beyond the Usual Bounds , 2020, NeurIPS.

[35]  O. Bousquet,et al.  Sharper bounds for uniformly stable algorithms , 2019, COLT.

[36]  Gintare Karolina Dziugaite,et al.  On the role of data in PAC-Bayes bounds , 2020, ArXiv.