Generalization Bounds via Information Density and Conditional Information Density

We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of sub-Gaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis. When suitably weakened, these bounds recover many of the information-theoretic bounds available in the literature. We also extend the proposed exponential-inequality approach to the setting recently introduced by Steinke and Zakynthinou (2020), where the learning algorithm depends on a randomly selected subset of the available training data. For this setup, we present bounds for bounded loss functions in terms of the conditional information density between the output hypothesis and the random variable determining the subset choice, given all training data. Through our approach, we recover the average generalization bound presented by Steinke and Zakynthinou (2020) and extend it to the PAC-Bayesian and single-draw scenarios. For the single-draw scenario, we also obtain novel bounds in terms of the conditional $\alpha $ -mutual information and the conditional maximal leakage.

[1]  Gintare Karolina Dziugaite,et al.  Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms , 2020, NeurIPS.

[2]  Giuseppe Durisi,et al.  Nonvacuous Loss Bounds with Fast Rates for Neural Networks via Conditional Information Measures , 2020, ArXiv.

[3]  Sudeep Kamath,et al.  An Operational Approach to Information Leakage , 2018, IEEE Transactions on Information Theory.

[4]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[5]  Ryan P. Adams,et al.  Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.

[6]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[7]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[8]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[9]  F. Alajaji,et al.  Lectures Notes in Information Theory , 2000 .

[10]  Benjamin Guedj,et al.  Still No Free Lunches: The Price to Pay for Tighter PAC-Bayes Bounds , 2019, Entropy.

[11]  Giuseppe Durisi,et al.  Generalization Error Bounds via mth Central Moments of the Information Density , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[12]  Gintare Karolina Dziugaite,et al.  Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[13]  Gintare Karolina Dziugaite,et al.  On the role of data in PAC-Bayes bounds , 2020, ArXiv.

[14]  Benjamin Guedj,et al.  A Primer on PAC-Bayesian Learning , 2019, ICML 2019.

[15]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[16]  Masahito Hayashi,et al.  Operational Interpretation of Rényi Information Measures via Composite Hypothesis Testing Against Product and Markov Distributions , 2015, IEEE Transactions on Information Theory.

[17]  Sergio Verdú,et al.  Chaining Mutual Information and Tightening Generalization Bounds , 2018, NeurIPS.

[18]  Shaofeng Zou,et al.  Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[19]  Raef Bassily,et al.  Learners that Use Little Information , 2017, ALT.

[20]  Thomas Steinke,et al.  Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[21]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[22]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[23]  Peter E. Latham,et al.  Mutual Information , 2006 .

[24]  R. Handel Probability in High Dimension , 2014 .

[25]  Michael Gastpar,et al.  Generalization Error Bounds via Rényi-, f-Divergences and Maximal Leakage , 2019, IEEE Transactions on Information Theory.

[26]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[27]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[28]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[29]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[30]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.