Private Testing of Distributions via Sample Permutations

Statistical tests are at the heart of many scientific tasks. To validate their hypothesis, researchers in medical and social sciences use individuals' data. The sensitivity of participants' data requires the design of statistical tests that ensure the privacy of the individuals in the most efficient way. In this paper, we use the framework of property testing to design algorithms to test the properties of the distribution that the data is drawn from with respect to differential privacy. In particular, we investigate testing two fundamental properties of distributions: (1) testing the equivalence of two distributions when we have unequal numbers of samples from the two distributions. (2) Testing independence of two random variables. In both cases, we show that our testers achieve near optimal sample complexity (up to logarithmic factors). Moreover, our dependence on the privacy parameter is an additive term, which indicates that differential privacy can be obtained in most regimes of parameters for free.

[1]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[2]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[3]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[4]  Tugkan Batu,et al.  Generalized Uniformity Testing , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[6]  Himanshu Tyagi,et al.  Test without Trust: Optimal Locally Private Distribution Testing , 2018, AISTATS.

[7]  Daniel M. Kane,et al.  Sharp Bounds for Generalized Uniformity Testing , 2017, Electron. Colloquium Comput. Complex..

[8]  Constantinos Daskalakis,et al.  Priv'IT: Private and Sample Efficient Identity Testing , 2017, ICML.

[9]  Jun Sakuma,et al.  Differentially Private Chi-squared Test by Unit Circle Mechanism , 2017, ICML.

[10]  Daniel M. Kane,et al.  Testing Identity of Structured Distributions , 2014, SODA.

[11]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[12]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[13]  Marco Gaboardi,et al.  Local Private Hypothesis Testing: Chi-Square Tests , 2017, ICML.

[14]  Cynthia Dwork,et al.  The Differential Privacy Frontier (Extended Abstract) , 2009, TCC.

[15]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[16]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[17]  Constantinos Daskalakis,et al.  Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing , 2016, COLT.

[18]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[19]  Daniel M. Kane,et al.  Testing Bayesian Networks , 2016, IEEE Transactions on Information Theory.

[20]  Gregory Valiant,et al.  Testing Closeness With Unequal Sized Samples , 2015, NIPS.

[21]  Ronitt Rubinfeld,et al.  Differentially Private Identity and Equivalence Testing of Discrete Distributions , 2018, ICML.

[22]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .

[23]  Or Sheffet,et al.  Locally Private Hypothesis Testing , 2018, ICML.

[24]  Daniel M. Kane,et al.  Near-Optimal Closeness Testing of Discrete Histogram Distributions , 2017, ICALP.

[25]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[26]  Daniel M. Kane,et al.  Testing Identity of Multidimensional Histograms , 2018, COLT.

[27]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[28]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[29]  Ilias Diakonikolas,et al.  Collision-based Testers are Optimal for Uniformity and Closeness , 2016, Electron. Colloquium Comput. Complex..

[30]  Alon Orlitsky,et al.  Sublinear algorithms for outlier detection and generalized closeness testing , 2014, 2014 IEEE International Symposium on Information Theory.

[31]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[32]  Ronitt Rubinfeld Taming big probability distributions , 2012, XRDS.

[33]  Daniel M. Kane,et al.  Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[34]  Daniel Kifer,et al.  A New Class of Private Chi-Square Tests , 2016, ArXiv.

[35]  Daniel M. Kane,et al.  Testing Conditional Independence of Discrete Distributions , 2018, 2018 Information Theory and Applications Workshop (ITA).

[36]  Ilias Diakonikolas,et al.  Sample-Optimal Identity Testing with High Probability , 2017, Electron. Colloquium Comput. Complex..

[37]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[38]  Ilias Diakonikolas,et al.  Fourier-Based Testing for Families of Distributions , 2017, Electron. Colloquium Comput. Complex..

[39]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[40]  Ilias Diakonikolas,et al.  Differentially Private Learning of Structured Discrete Distributions , 2015, NIPS.

[41]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[42]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[43]  Rocco A. Servedio,et al.  Testing k-Modal Distributions: Optimal Algorithms via Reductions , 2011, SODA.

[44]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[45]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.