论文信息 - Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms

Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms

The information-theoretic framework of Russo and J. Zou (2016) and Xu and Raginsky (2017) provides bounds on the generalization error of a learning algorithm in terms of the mutual information between the algorithm's output and the training sample. In this work, we study the proposal, by Steinke and Zakynthinou (2020), to reason about the generalization error of a learning algorithm by introducing a super sample that contains the training sample as a random subset and computing mutual information conditional on the super sample. We first show that these new bounds based on the conditional mutual information are tighter than those based on the unconditional mutual information. We then introduce yet tighter bounds, building on the "individual sample" idea of Bu, S. Zou, and Veeravalli (2019) and the "data dependent" ideas of Negrea et al. (2019), using disintegrated mutual information. Finally, we apply these bounds to the study of Langevin dynamics algorithm, showing that conditioning on the super sample allows us to exploit information in the optimization trajectory to obtain tighter bounds based on hypothesis tests.

Daniel M. Roy | G. Dziugaite | Jeffrey Negrea | A. Khisti | Mahdi Haghifam

[1] O. Kallenberg. Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[2] Thomas Steinke,et al. Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[3] Emmanuel Abbe,et al. Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets , 2019, ArXiv.

[4] Jian Li,et al. On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning , 2019, ICLR.

[5] James Zou,et al. How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage , 2015, IEEE Transactions on Information Theory.

[6] Gintare Karolina Dziugaite,et al. Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[7] Michael Gastpar,et al. Strengthened Information-theoretic Bounds on the Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[8] Shaofeng Zou,et al. Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[9] Varun Jog,et al. Generalization error bounds using Wasserstein distances , 2018, 2018 IEEE Information Theory Workshop (ITW).

[10] Sergio Verdú,et al. Chaining Mutual Information and Tightening Generalization Bounds , 2018, NeurIPS.

[11] Varun Jog,et al. Generalization Error Bounds for Noisy, Iterative Algorithms , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[12] Raef Bassily,et al. Learners that Use Little Information , 2017, ALT.

[13] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[14] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[15] Yanjun Han,et al. Dependence measures bounding the exploration bias for general measurements , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[16] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[17] Maxim Raginsky,et al. Information-theoretic analysis of stability and bias of learning algorithms , 2016, 2016 IEEE Information Theory Workshop (ITW).

[18] James Zou,et al. Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[19] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[20] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[21] S. Mitter,et al. Recursive stochastic algorithms for global optimization in R d , 1991 .

[22] R. Durrett. Probability: Theory and Examples , 1993 .

[23] Te Sun Han. Nonnegative Entropy Measures of Multivariate Symmetric Correlations , 1978, Inf. Control..