Upper Bounds on the Generalization Error of Private Algorithms for Discrete Data

In this work, we study the generalization capability of algorithms from an information-theoretic perspective. It has been shown that the expected generalization error of an algorithm is bounded from above by a function of the relative entropy between the conditional probability distribution of the algorithm’s output hypothesis, given the dataset with which it was trained, and its marginal probability distribution. We build upon this fact and introduce a mathematical formulation to obtain upper bounds on this relative entropy. Assuming that the data is discrete, we then develop a strategy using this formulation, based on the method of types and typicality, to find explicit upper bounds on the generalization error of stable algorithms, i.e., algorithms that produce similar output hypotheses given similar input datasets. In particular, we show the bounds obtained with this strategy for the case of <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-DP and <inline-formula> <tex-math notation="LaTeX">$\mu $ </tex-math></inline-formula>-GDP algorithms.

[1]  Maxim Raginsky,et al.  Information-theoretic analysis of stability and bias of learning algorithms , 2016, 2016 IEEE Information Theory Workshop (ITW).

[2]  Stephen E. Fienberg,et al.  Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases , 2014, Privacy in Statistical Databases.

[3]  Christopher Jung,et al.  A new analysis of differential privacy’s generalization guarantees (invited paper) , 2019, ITCS.

[4]  V. V. Buldygin,et al.  Sub-Gaussian random variables , 1980 .

[5]  Jinshuo Dong,et al.  Deep Learning with Gaussian Differential Privacy , 2020, Harvard data science review.

[6]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Daniel M. Roy,et al.  Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms , 2020, NeurIPS.

[8]  Gilles Barthe,et al.  Information-Theoretic Bounds for Differentially Private Mechanisms , 2011, 2011 IEEE 24th Computer Security Foundations Symposium.

[9]  Shaofeng Zou,et al.  Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[10]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[11]  F. Alajaji,et al.  Lectures Notes in Information Theory , 2000 .

[12]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[13]  Petre Stoica,et al.  Maximum likelihood estimation of the parameters of multiple sinusoids from noisy measurements , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Yuichi Kaji Bounds on the entropy of multinomial distribution , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[15]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[16]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[17]  Yanjun Han,et al.  Dependence measures bounding the exploration bias for general measurements , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[18]  Michael Gastpar,et al.  Strengthened Information-theoretic Bounds on the Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[19]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[20]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[21]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[22]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[23]  José Cândido Silveira Santos Filho,et al.  An Information-Theoretic View of Generalization via Wasserstein Distance , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[24]  Christopher Joseph Pal,et al.  Brain tumor segmentation with Deep Neural Networks , 2015, Medical Image Anal..

[25]  Thomas Steinke,et al.  Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[26]  Sergio Verdú,et al.  Chaining Mutual Information and Tightening Generalization Bounds , 2018, NeurIPS.

[27]  Giuseppe Durisi,et al.  Generalization Bounds via Information Density and Conditional Information Density , 2020, IEEE Journal on Selected Areas in Information Theory.

[28]  John N. McDonald,et al.  A course in real analysis , 1999 .

[29]  James Zou,et al.  How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage , 2015, IEEE Transactions on Information Theory.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Maxim Raginsky,et al.  Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[32]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[33]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[34]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[35]  Jan Vondrák,et al.  High probability generalization bounds for uniformly stable algorithms with nearly optimal rate , 2019, COLT.

[36]  Antti Honkela,et al.  Differentially Private Markov Chain Monte Carlo , 2019, NeurIPS.

[37]  Aaron Roth,et al.  Gaussian differential privacy , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[38]  Mikael Skoglund,et al.  On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm , 2020, 2020 IEEE Information Theory Workshop (ITW).

[39]  Stephen E. Fienberg,et al.  Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle , 2015, J. Mach. Learn. Res..