Private Identity Testing for High-Dimensional Distributions

In this work we present novel differentially private identity (goodness-of-fit) testers for natural and widely studied classes of multivariate product distributions: Gaussians in $\mathbb{R}^d$ with known covariance and product distributions over $\{\pm 1\}^{d}$. Our testers have improved sample complexity compared to those derived from previous techniques, and are the first testers whose sample complexity matches the order-optimal minimax sample complexity of $O(d^{1/2}/\alpha^2)$ in many parameter regimes. We construct two types of testers, exhibiting tradeoffs between sample complexity and computational complexity. Finally, we provide a two-way reduction between testing a subclass of multivariate product distributions and testing univariate distributions, and thereby obtain upper and lower bounds for testing this subclass of product distributions.

[1]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[2]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[3]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[4]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[5]  Y. Peres,et al.  Concentration inequalities for polynomials of contracting Ising models , 2017, 1706.00121.

[6]  Sofya Raskhodnikova,et al.  Analyzing Graphs with Node Differential Privacy , 2013, TCC.

[7]  Sofya Raskhodnikova,et al.  Lipschitz Extensions for Node-Private Graph Statistics and the Generalized Exponential Mechanism , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Daniel M. Kane,et al.  Testing Bayesian Networks , 2016, IEEE Transactions on Information Theory.

[9]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[10]  Gregory Valiant,et al.  Testing Closeness With Unequal Sized Samples , 2015, NIPS.

[11]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[12]  Andrew Bray,et al.  Improved Differentially Private Analysis of Variance , 2019, Proc. Priv. Enhancing Technol..

[13]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[15]  Ronitt Rubinfeld,et al.  Differentially Private Identity and Closeness Testing of Discrete Distributions , 2017, ArXiv.

[16]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[17]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[18]  Gautam Chetan Kamath,et al.  Modern challenges in distribution testing , 2018 .

[19]  Anna M. Ritz,et al.  Differentially Private ANOVA Testing , 2017, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[20]  Yajun Mei,et al.  Differentially Private Change-Point Detection , 2018, NeurIPS.

[21]  Janardhan Kulkarni,et al.  Locally Private Gaussian Estimation , 2018, NeurIPS.

[22]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[23]  L. Devroye,et al.  The total variation distance between high-dimensional Gaussians , 2018, 1810.08693.

[24]  Maziar Salahi A short note on minx in Rn(||Ax-b||2 / 1+||x||2) , 2009, Appl. Math. Comput..

[25]  Stephen E. Fienberg,et al.  A Minimax Theory for Adaptive Data Analysis , 2016, ArXiv.

[26]  Daniel Kifer,et al.  A New Class of Private Chi-Square Tests , 2016, ArXiv.

[27]  Eric Vigoda,et al.  Lower bounds for testing graphical models: colorings and antiferromagnetic Ising models , 2019, COLT.

[28]  Adam Groce,et al.  Differentially Private Nonparametric Hypothesis Testing , 2019, CCS.

[29]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[30]  Yu. I. Ingster,et al.  Nonparametric Goodness-of-Fit Testing Under Gaussian Models , 2002 .

[31]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[32]  Daniel Kifer,et al.  Statistical Approximating Distributions Under Differential Privacy , 2018, J. Priv. Confidentiality.

[33]  Constantinos Daskalakis,et al.  Which Distribution Distances are Sublinearly Testable? , 2017, Electron. Colloquium Comput. Complex..

[34]  Marco Gaboardi,et al.  Locally Private Mean Estimation: Z-test and Tight Confidence Intervals , 2018, AISTATS.

[35]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[36]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[37]  Ilias Diakonikolas,et al.  Differentially Private Learning of Structured Discrete Distributions , 2015, NIPS.

[38]  Constantinos Daskalakis,et al.  Learning and Testing Causal Models with Interventions , 2018, NeurIPS.

[39]  Constantinos Daskalakis,et al.  Priv'IT: Private and Sample Efficient Identity Testing , 2017, ICML.

[40]  Jun Sakuma,et al.  Differentially Private Chi-squared Test by Unit Circle Mechanism , 2017, ICML.

[41]  Aaron Roth,et al.  Max-Information, Differential Privacy, and Post-selection Hypothesis Testing , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[42]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[43]  Marco Gaboardi,et al.  Local Private Hypothesis Testing: Chi-Square Tests , 2017, ICML.

[44]  Oded Goldreich The uniform distribution is complete with respect to testing identity to a fixed distribution , 2016, Electron. Colloquium Comput. Complex..

[45]  Dana Ron,et al.  On Testing Expansion in Bounded-Degree Graphs , 2000, Studies in Complexity and Cryptography.

[46]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[47]  Sivaraman Balakrishnan,et al.  Hypothesis Testing for High-Dimensional Multinomials: A Selective Review , 2017, ArXiv.

[48]  Avrim Blum,et al.  Differentially private data analysis of social networks via restricted sensitivity , 2012, ITCS '13.

[49]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[50]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[51]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[52]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.

[53]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[54]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[55]  Constantinos Daskalakis,et al.  Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing , 2016, COLT.

[56]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[57]  Yu. I. Ingster Minimax detection of a signal in ℓp metrics , 1994 .

[58]  Yu. I. Ingster Adaptive chi-square tests , 2000 .

[59]  Or Sheffet,et al.  Locally Private Hypothesis Testing , 2018, ICML.

[60]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[61]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[62]  Toniann Pitassi,et al.  The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.

[63]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[64]  Yichen Wang,et al.  The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[65]  Jerry Li,et al.  Privately Learning High-Dimensional Distributions , 2018, COLT.

[66]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[67]  Aleksandra B. Slavkovic,et al.  Differentially Private Uniformly Most Powerful Tests for Binomial Data , 2018, NeurIPS.

[68]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[69]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[70]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[71]  David Durfee,et al.  Individual Sensitivity Preprocessing for Data Privacy , 2018, SODA.

[72]  Daniel M. Kane,et al.  Testing Conditional Independence of Discrete Distributions , 2017, 2018 Information Theory and Applications Workshop (ITA).

[73]  Chunming Qiao,et al.  Mutual Information Optimally Local Private Discrete Distribution Estimation , 2016, ArXiv.

[74]  Clément L. Canonne,et al.  Distribution Testing Lower Bounds via Reductions from Communication Complexity , 2017, Computational Complexity Conference.

[75]  Huanyu Zhang,et al.  INSPECTRE: Privately Estimating the Unseen , 2018, ICML.

[76]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[77]  Thomas Steinke,et al.  Between Pure and Approximate Differential Privacy , 2015, J. Priv. Confidentiality.

[78]  E. J. McShane,et al.  Extension of range of functions , 1934 .

[79]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .

[80]  Ronitt Rubinfeld,et al.  Testing Non-uniform k-Wise Independent Distributions over Product Spaces , 2010, ICALP.

[81]  Kobbi Nissim,et al.  Impossibility of Differentially Private Universally Optimal Mechanisms , 2010, FOCS.

[82]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[83]  Feng Ruan,et al.  The Right Complexity Measure in Locally Private Estimation: It is not the Fisher Information , 2018, ArXiv.

[84]  Christian Borgs,et al.  Revealing Network Structure, Confidentially: Improved Rates for Node-Private Graphon Estimation , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[85]  Himanshu Tyagi,et al.  Test without Trust: Optimal Locally Private Distribution Testing , 2018, AISTATS.

[86]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[87]  Daniel M. Kane,et al.  Testing Identity of Structured Distributions , 2014, SODA.

[88]  Iracema Dulley,et al.  A short note , 2019, On the Emic Gesture.

[89]  Adam D. Smith,et al.  The structure of optimal private tests for simple hypotheses , 2018, STOC.

[90]  Jonathan Ullman,et al.  Efficiently Estimating Erdos-Renyi Graphs with Node Differential Privacy , 2019, NeurIPS.

[91]  Christian Borgs,et al.  Private Algorithms Can Always Be Extended , 2018, ArXiv.