Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit

We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., shared randomness) leads to a significant reduction in the sample complexity of this problem. In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. We provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. We complement our upper bound with a general framework to prove lower bounds on the sample complexity of this testing problems under limited shared randomness. Finally, we instantiate our bounds for the two archetypal information constraints of communication and local privacy, and show that our sample complexity bounds are optimal as a function of all the parameters of the problem, including the amount of shared randomness. A key component of our upper bounds is a new primitive of domain compression, a tool that allows us to map distributions to a much smaller domain size while preserving their pairwise distances, using a limited amount of randomness.

[1]  A. Wigderson,et al.  Disperser graphs, deterministic amplification, and imperfect random sources (גרפים מפזרים, הגברה דטרמיניסטית ומקורות אקראים חלשים.) , 1991 .

[2]  Himanshu Tyagi,et al.  Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.

[3]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[4]  J. Lindenstrauss,et al.  Extensions of lipschitz maps into Banach spaces , 1986 .

[5]  Daniel M. Kane,et al.  Communication and Memory Efficient Testing of Discrete Distributions , 2019, COLT.

[6]  Oded Goldreich The uniform distribution is complete with respect to testing identity to a fixed distribution , 2016, Electron. Colloquium Comput. Complex..

[7]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[8]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[9]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[10]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[11]  Alexander Barg,et al.  Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy , 2017, IEEE Transactions on Information Theory.

[12]  Daniel M. Kane,et al.  Testing Bayesian Networks , 2016, IEEE Transactions on Information Theory.

[13]  Oded Goldreich,et al.  Introduction to Property Testing , 2017 .

[14]  Himanshu Tyagi,et al.  Communication-Constrained Inference and the Role of Shared Randomness , 2019, ICML.

[15]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[16]  Or Sheffet,et al.  Locally Private Hypothesis Testing , 2018, ICML.

[17]  Jayadev Acharya,et al.  Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters , 2019, ICML.

[18]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[19]  Yanjun Han,et al.  Learning Distributions from their Samples under Communication Constraints , 2019, ArXiv.

[20]  Himanshu Tyagi,et al.  Inference Under Information Constraints II: Communication Constraints and Shared Randomness , 2019, IEEE Transactions on Information Theory.

[21]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[22]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[23]  Yanjun Han,et al.  Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints , 2018, IEEE Transactions on Information Theory.

[24]  Marco Gaboardi,et al.  Local Private Hypothesis Testing: Chi-Square Tests , 2017, ICML.

[25]  Oded Goldreich,et al.  On the power of two-point based sampling , 1989, J. Complex..

[26]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[27]  Rotem Oshman,et al.  Distributed Uniformity Testing , 2018, PODC.

[28]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[29]  Jonathan Ullman,et al.  Private Identity Testing for High-Dimensional Distributions , 2019, NeurIPS.

[30]  Dana Ron,et al.  On Testing Expansion in Bounded-Degree Graphs , 2000, Studies in Complexity and Cryptography.

[31]  Sivaraman Balakrishnan,et al.  Hypothesis Testing for High-Dimensional Multinomials: A Selective Review , 2017, ArXiv.

[32]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[33]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[34]  J. M. Phillips,et al.  JOHNSON-LINDENSTRAUSS DIMENSIONALITY REDUCTION ON THE SIMPLEX , 2010 .

[35]  Ilias Diakonikolas,et al.  Sample-Optimal Identity Testing with High Probability , 2017, Electron. Colloquium Comput. Complex..

[36]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[37]  B. Bollobás THE VOLUME OF CONVEX BODIES AND BANACH SPACE GEOMETRY (Cambridge Tracts in Mathematics 94) , 1991 .

[38]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[39]  Seshadhri Comandur,et al.  Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[40]  Himanshu Tyagi,et al.  Test without Trust: Optimal Locally Private Distribution Testing , 2018, AISTATS.

[41]  Yanjun Han,et al.  Distributed Statistical Estimation of High-Dimensional and Nonparametric Distributions , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).