Inference Under Information Constraints III: Local Privacy Constraints

We study goodness-of-fit and independence testing of discrete distributions in a setting where samples are distributed across multiple users. The users wish to preserve the privacy of their data while enabling a central server to perform the tests. Under the notion of local differential privacy, we propose simple, sample-optimal, and communication-efficient protocols for these two questions in the noninteractive setting, where in addition users may or may not share a common random seed. In particular, we show that the availability of shared (public) randomness greatly reduces the sample complexity. Underlying our public-coin protocols are privacy-preserving mappings which, when applied to the samples, minimally contract the distance between their respective probability distributions. A preliminary version of this work containing partial results appeared in the Proceedings of the 22 International Conference on Artificial Intelligence and Statistics (AISTATS), 2019 [1]. ∗Cornell University. Email: acharya@cornell.edu. Supported by NSF-CCF-1846300 (CAREER), NSF-CCF-1815893, and a Google Faculty Research Award. †University of Sydney. Email: ccanonne@cs.columbia.edu. This work was performed while a Goldstine Postdoctoral Fellow at IBM Research, and a Motwani Postdoctoral Fellow at Stanford University. ‡Cornell Tech. Email: cfreitag@cs.cornell.edu. Supported in part by NSF GRFP award DGE-1650441. §Cornell University. Email: zs335@cornell.edu. Supported in part by NSF-CCF-1846300 (CAREER). ¶Indian Institute of Science. Email: htyagi@iisc.ac.in. Supported in part by a research grant from the Robert Bosch Center for Cyberphysical Systems (RBCCPS), Indian Institute of Science, Bangalore.

[1]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[2]  Ilias Diakonikolas,et al.  Sample-Optimal Identity Testing with High Probability , 2017, Electron. Colloquium Comput. Complex..

[3]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .

[4]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[5]  Seth Neel,et al.  The Role of Interactivity in Local Differential Privacy , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[6]  Daniel Kifer,et al.  A New Class of Private Chi-Square Hypothesis Tests , 2017, AISTATS.

[7]  Himanshu Tyagi,et al.  Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit , 2019, Electron. Colloquium Comput. Complex..

[8]  Clément L. Canonne Big Data on the Rise? - Testing Monotonicity of Distributions , 2015, ICALP.

[9]  Jieming Mao,et al.  Connecting Robust Shuffle Privacy and Pan-Privacy , 2020, SODA.

[10]  Himanshu Tyagi,et al.  Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.

[11]  Ronitt Rubinfeld Taming big probability distributions , 2012, XRDS.

[12]  J. Tsitsiklis Decentralized Detection' , 1993 .

[13]  Himanshu Tyagi,et al.  Test without Trust: Optimal Locally Private Distribution Testing , 2018, AISTATS.

[14]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[15]  Ronitt Rubinfeld,et al.  Private Testing of Distributions via Sample Permutations , 2019, NeurIPS.

[16]  Himanshu Tyagi,et al.  Interactive Inference Under Information Constraints , 2020, IEEE Transactions on Information Theory.

[17]  Marco Gaboardi,et al.  Local Private Hypothesis Testing: Chi-Square Tests , 2017, ICML.

[18]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[19]  Ronitt Rubinfeld,et al.  Differentially Private Identity and Equivalence Testing of Discrete Distributions , 2018, ICML.

[20]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[21]  Daniel M. Kane,et al.  Testing Bayesian Networks , 2016, IEEE Transactions on Information Theory.

[22]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[23]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[24]  Himanshu Tyagi,et al.  Inference Under Information Constraints II: Communication Constraints and Shared Randomness , 2019, IEEE Transactions on Information Theory.

[25]  Daniel Kifer,et al.  A New Class of Private Chi-Square Tests , 2016, ArXiv.

[26]  Sivaraman Balakrishnan,et al.  Hypothesis Testing for High-Dimensional Multinomials: A Selective Review , 2017, ArXiv.

[27]  Kareem Amin,et al.  Pan-Private Uniformity Testing , 2019, COLT.

[28]  Clément L. Canonne,et al.  Distribution Testing Lower Bounds via Reductions from Communication Complexity , 2017, Computational Complexity Conference.

[29]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[30]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[31]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[32]  Sean P. Meyn,et al.  Generalized Error Exponents for Small Sample Universal Hypothesis Testing , 2012, IEEE Transactions on Information Theory.

[33]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[34]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[35]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[36]  Or Sheffet,et al.  Locally Private Hypothesis Testing , 2018, ICML.

[37]  Cristina Butucea,et al.  Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms , 2020, NeurIPS.

[38]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[39]  Constantinos Daskalakis,et al.  Priv'IT: Private and Sample Efficient Identity Testing , 2017, ICML.

[40]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[41]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[42]  Jayadev Acharya,et al.  Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters , 2019, ICML.

[43]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[44]  Oded Goldreich The uniform distribution is complete with respect to testing identity to a fixed distribution , 2016, Electron. Colloquium Comput. Complex..

[45]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[46]  Jonathan Ullman,et al.  Private Identity Testing for High-Dimensional Distributions , 2019, NeurIPS.

[47]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.