Efron-Stein PAC-Bayesian Inequalities

We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables. These inequalities describe concentration of random variable in terms of the data/distribution-dependent Efron-Stein (ES) estimate of its variance and they do not require any additional assumptions on the moments. In particular, this allows us to state semi-empirical Bernstein type inequalities for general functions of unbounded random variables, which gives user-friendly concentration bounds for cases where related methods (e.g. bounded differences) might be more challenging to apply. We extend these results to Efron-Stein PAC-Bayesian inequalities which hold for arbitrary probability kernels that define a random, data-dependent choice of the function of interest. Finally, we demonstrate a number of applications, including PAC-Bayesian generalization bounds for unbounded loss functions, empirical Bernstein type generalization bounds, new truncation-free bounds for off-policy evaluation with Weighted Importance Sampling (WIS), and off-policy PAC-Bayesian learning with WIS.

[1]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[2]  Andreas Maurer A Bernstein-type inequality for functions of bounded interaction , 2019, Bernoulli.

[3]  Massimiliano Pontil,et al.  Empirical bounds for functions with weak interactions , 2018, COLT.

[4]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[5]  Peter L. Bartlett,et al.  Localized Rademacher Complexities , 2002, COLT.

[6]  P. MassartLedoux,et al.  Concentration Inequalities Using the Entropy Method , 2002 .

[7]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[8]  Peter Grünwald,et al.  PAC-Bayes Un-Expected Bernstein Inequality , 2019, NeurIPS.

[9]  John Shawe-Taylor,et al.  PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[10]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[11]  Matthew J. Holland PAC-Bayes under potentially heavy tails , 2019, NeurIPS.

[12]  J. Lynch,et al.  A weak convergence approach to the theory of large deviations , 1997 .

[13]  Christian Igel,et al.  A Strongly Quasiconvex PAC-Bayesian Bound , 2016, ALT.

[14]  Csaba Szepesvári,et al.  An Exponential Tail Bound for Lq Stable Learning Rules. Application to k-Folds Cross-Validation , 2019, ISAIM.

[15]  Andreas Maurer A bound on the deviation probability for sums of non-negative random variables. , 2003 .

[16]  Ilja Kuzborskij,et al.  Distribution-Dependent Analysis of Gibbs-ERM Principle , 2019, COLT.

[17]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[18]  Marcello Restelli,et al.  Policy Optimization via Importance Sampling , 2018, NeurIPS.

[19]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[20]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[21]  T. Hesterberg,et al.  Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[22]  Philip S. Thomas,et al.  High-Confidence Off-Policy Evaluation , 2015, AAAI.

[23]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[24]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[25]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[26]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[27]  Pierre Alquier,et al.  Simpler PAC-Bayesian bounds for hostile data , 2016, Machine Learning.

[28]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[29]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[30]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[31]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[32]  Peter Grünwald,et al.  A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.

[33]  Yevgeny Seldin,et al.  PAC-Bayes-Empirical-Bernstein Inequality , 2013, NIPS.

[34]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[35]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[36]  Alexandre Lacoste,et al.  PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[37]  Shiliang Sun,et al.  PAC-Bayes bounds for stable algorithms with instance-dependent priors , 2018, NeurIPS.

[38]  Karthik Sridharan,et al.  On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities , 2015, COLT.

[39]  C. Robert,et al.  Rethinking the Effective Sample Size , 2018, International Statistical Review.

[40]  Philip S. Thomas,et al.  High Confidence Policy Improvement , 2015, ICML.

[41]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[42]  Olivier Wintenberger,et al.  Optimal learning with Bernstein online aggregation , 2014, Machine Learning.