Statistical Approximating Distributions Under Differential Privacy

Statistics computed from data are viewed as random variables. When they are used for tasks like hypothesis testing and confidence intervals, their true finite sample distributions are often replaced by approximating distributions that are easier to work with (for example, the Gaussian, which results from using approximations justified by the Central Limit Theorem). When data are perturbed by differential privacy, the approximating distributions also need to be modified. Prior work provided various competing methods for creating such approximating distributions with little formal justification beyond the fact that they worked well empirically. In this paper, we study the question of how to generate statistical approximating distributions for differentially private statistics, provide finite sample guarantees for the quality of the approximations.

[1]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[2]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[3]  D. A. Edwards On the Kantorovich–Rubinstein theorem , 2011 .

[4]  Daniel Kifer,et al.  A New Class of Private Chi-Square Hypothesis Tests , 2017, AISTATS.

[5]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[6]  Constantinos Daskalakis,et al.  Priv'IT: Private and Sample Efficient Identity Testing , 2017, ICML.

[7]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[8]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[9]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[10]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[11]  Ashwin Machanavajjhala,et al.  Pufferfish , 2014, ACM Trans. Database Syst..

[12]  Vishesh Karwa,et al.  Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs , 2012, 1205.4697.

[13]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[14]  Eftychia Solea,et al.  Differentially Private Hypothesis Testing For Normal Random Variables. , 2014 .

[15]  Nathan Ross Fundamentals of Stein's method , 2011, 1109.1880.

[16]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[17]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[18]  Anne-Sophie Charest,et al.  How Can We Analyze Differentially-Private Synthetic Datasets? , 2011, J. Priv. Confidentiality.

[19]  Serena Arima,et al.  Small area estimation with covariates perturbed for disclosure limitation , 2015 .

[20]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[21]  Vito D'Orazio,et al.  Differential Privacy for Social Science Inference , 2015 .

[22]  T. Ferguson A Course in Large Sample Theory , 1996 .

[23]  Yue Wang,et al.  Differentially Private Hypothesis Testing, Revisited , 2015, ArXiv.

[24]  Stephen E. Fienberg,et al.  Differential Privacy and the Risk-Utility Tradeoff for Multi-dimensional Contingency Tables , 2010, Privacy in Statistical Databases.

[25]  Kamalika Chaudhuri,et al.  Convergence Rates for Differentially Private Statistical Estimation , 2012, ICML.

[26]  Robert E. Gaunt Stein’s method for functions of multivariate normal random variables , 2015, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques.

[27]  Aleksandra B. Slavkovic,et al.  Differentially Private Uniformly Most Powerful Tests for Binomial Data , 2018, NeurIPS.

[28]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[29]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[30]  Ashwin Machanavajjhala,et al.  Differentially Private Regression Diagnostics , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[31]  Y. Rinott,et al.  Confidentiality and Differential Privacy in the Dissemination of Frequency Tables , 2018, Statistical Science.

[32]  Cynthia Dwork,et al.  Private False Discovery Rate Control , 2015, ArXiv.

[33]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[34]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[35]  Harvey Goldstein,et al.  A Probabilistic Procedure for Anonymisation and Analysis of Perturbed Datasets , 2018 .

[36]  Or Sheffet Differentially Private Ordinary Least Squares: $t$-Values, Confidence Intervals and Rejecting Null-Hypotheses , 2015 .