An Information-Theoretic View of Generalization via Wasserstein Distance

We capitalize on the Wasserstein distance to obtain two information-theoretic bounds on the generalization error of learning algorithms. First, we specialize the Wasserstein distance into total variation, by using the discrete metric. In this case we derive a generalization bound and, from a strong data-processing inequality, show how to narrow the bound by adding Gaussian noise to the output hypothesis. Second, we consider the Wasserstein distance under a generic metric. In this case we derive a generalization bound by exploiting the geometric nature of the Kantorovich-Rubinstein duality theorem. We illustrate the use of these bounds with examples. Our bounds can handle certain cases in which existing bounds via mutual information fail.

[1]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[2]  Maxim Raginsky,et al.  Information-theoretic analysis of stability and bias of learning algorithms , 2016, 2016 IEEE Information Theory Workshop (ITW).

[3]  Varun Jog,et al.  Generalization error bounds using Wasserstein distances , 2018, 2018 IEEE Information Theory Workshop (ITW).

[4]  Yihong Wu,et al.  Dissipation of Information in Channels With Input Constraints , 2014, IEEE Transactions on Information Theory.

[5]  James Zou,et al.  How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage , 2015, IEEE Transactions on Information Theory.

[6]  Sergio Verdú,et al.  Chaining Mutual Information and Tightening Generalization Bounds , 2018, NeurIPS.

[7]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[8]  Yanjun Han,et al.  Dependence measures bounding the exploration bias for general measurements , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[9]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[10]  Michael Gastpar,et al.  Computable Bounds on the Exploration Bias , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[11]  Ibrahim M. Alabdulmohsin Algorithmic Stability and Uniform Generalization , 2015, NIPS.

[12]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[13]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[14]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[15]  C. Villani Optimal Transport: Old and New , 2008 .

[16]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.