论文信息 - Efron-Stein PAC-Bayesian Inequalities - 字舞流文

Efron-Stein PAC-Bayesian Inequalities

We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables. These inequalities describe concentration of random variable in terms of the data/distribution-dependent Efron-Stein (ES) estimate of its variance and they do not require any additional assumptions on the moments. In particular, this allows us to state semi-empirical Bernstein type inequalities for general functions of unbounded random variables, which gives user-friendly concentration bounds for cases where related methods (e.g. bounded differences) might be more challenging to apply. We extend these results to Efron-Stein PAC-Bayesian inequalities which hold for arbitrary probability kernels that define a random, data-dependent choice of the function of interest. Finally, we demonstrate a number of applications, including PAC-Bayesian generalization bounds for unbounded loss functions, empirical Bernstein type generalization bounds, new truncation-free bounds for off-policy evaluation with Weighted Importance Sampling (WIS), and off-policy PAC-Bayesian learning with WIS.

Ilja Kuzborskij | Csaba Szepesvári

[1] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[2] Andreas Maurer. A Bernstein-type inequality for functions of bounded interaction , 2019, Bernoulli.

[3] Massimiliano Pontil,et al. Empirical bounds for functions with weak interactions , 2018, COLT.

[4] Pierre Alquier,et al. On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[5] Peter L. Bartlett,et al. Localized Rademacher Complexities , 2002, COLT.

[6] P. MassartLedoux,et al. Concentration Inequalities Using the Entropy Method , 2002 .

[7] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[8] Peter Grünwald,et al. PAC-Bayes Un-Expected Bernstein Inequality , 2019, NeurIPS.

[9] John Shawe-Taylor,et al. PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[10] Andreas Maurer,et al. A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[11] Matthew J. Holland. PAC-Bayes under potentially heavy tails , 2019, NeurIPS.

[12] J. Lynch,et al. A weak convergence approach to the theory of large deviations , 1997 .

[13] Christian Igel,et al. A Strongly Quasiconvex PAC-Bayesian Bound , 2016, ALT.

[14] Csaba Szepesvári,et al. An Exponential Tail Bound for Lq Stable Learning Rules. Application to k-Folds Cross-Validation , 2019, ISAIM.

[15] Andreas Maurer. A bound on the deviation probability for sums of non-negative random variables. , 2003 .

[16] Ilja Kuzborskij,et al. Distribution-Dependent Analysis of Gibbs-ERM Principle , 2019, COLT.

[17] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[18] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.

[19] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.

[20] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.

[21] T. Hesterberg,et al. Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[22] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.

[23] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[24] S. Varadhan,et al. Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[25] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[26] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[27] Pierre Alquier,et al. Simpler PAC-Bayesian bounds for hostile data , 2016, Machine Learning.

[28] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[29] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[30] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[31] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[32] Peter Grünwald,et al. A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.

[33] Yevgeny Seldin,et al. PAC-Bayes-Empirical-Bernstein Inequality , 2013, NIPS.

[34] John Shawe-Taylor,et al. PAC-Bayes & Margins , 2002, NIPS.

[35] O. Catoni. PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[36] Alexandre Lacoste,et al. PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[37] Shiliang Sun,et al. PAC-Bayes bounds for stable algorithms with instance-dependent priors , 2018, NeurIPS.

[38] Karthik Sridharan,et al. On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities , 2015, COLT.

[39] C. Robert,et al. Rethinking the Effective Sample Size , 2018, International Statistical Review.

[40] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.

[41] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[42] Olivier Wintenberger,et al. Optimal learning with Bernstein online aggregation , 2014, Machine Learning.