Generalization Error Bounds via Rényi-, f-Divergences and Maximal Leakage

In this work, the probability of an event under some joint distribution is bounded by measuring it with the product of the marginals instead (which is typically easier to analyze) together with a measure of the dependence between the two random variables. These results find applications in adaptive data analysis, where multiple dependencies are introduced and in learning theory, where they can be employed to bound the generalization error of a learning algorithm. Bounds are given in terms of $\alpha-$Divergence, Sibson's Mutual Information and $f-$Divergence. A case of particular interest is the Maximal Leakage (or Sibson's Mutual Information of order infinity) since this measure is robust to post-processing and composes adaptively. This bound can also be seen as a generalization of classical bounds, such as Hoeffding's and McDiarmid's inequalities, to the case of dependent random variables.

[1]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[2]  Varun Jog,et al.  Generalization Error Bounds for Noisy, Iterative Algorithms , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[3]  S. Girard,et al.  Sub‐Weibull distributions: Generalizing sub‐Gaussian and sub‐Exponential properties to heavier tailed distributions , 2019, Stat.

[4]  Aaron Roth,et al.  Max-Information, Differential Privacy, and Post-selection Hypothesis Testing , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Michael Gastpar,et al.  Computable Bounds on the Exploration Bias , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[6]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[7]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[8]  James Zou,et al.  Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.

[9]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[10]  Peter E. Latham,et al.  Mutual Information , 2006 .

[11]  Varun Jog,et al.  Generalization error bounds using Wasserstein distances , 2018, 2018 IEEE Information Theory Workshop (ITW).

[12]  Sergio Verdú,et al.  Chaining Mutual Information and Tightening Generalization Bounds , 2018, NeurIPS.

[13]  Sudeep Kamath,et al.  An Operational Approach to Information Leakage , 2018, IEEE Transactions on Information Theory.

[14]  Imre Csiszár Generalized cutoff rates and Renyi's information measures , 1995, IEEE Trans. Inf. Theory.

[15]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[16]  R. Sibson Information radius , 1969 .

[17]  José Cândido Silveira Santos Filho,et al.  An Information-Theoretic View of Generalization via Wasserstein Distance , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[18]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[19]  Shaofeng Zou,et al.  Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[20]  Yanjun Han,et al.  Dependence measures bounding the exploration bias for general measurements , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[21]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[22]  Raef Bassily,et al.  Learners that Use Little Information , 2017, ALT.

[23]  Sergio Verdú,et al.  $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.

[24]  Sudeep Kamath,et al.  An operational measure of information leakage , 2016, 2016 Annual Conference on Information Science and Systems (CISS).

[25]  Catuscia Palamidessi,et al.  Quantitative Notions of Leakage for One-try Attacks , 2009, MFPS.

[26]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[27]  Lech Maligranda,et al.  Amemiya norm equals Orlicz norm in general , 2000 .

[28]  J. Pfanzagl On the existence of regular conditional probabilities , 1969 .

[29]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[30]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[31]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.