Distribution Testing Lower Bounds via Reductions from Communication Complexity

We present a new methodology for proving distribution testing lower bounds, establishing a connection between distribution testing and the simultaneous message passing (SMP) communication model. Extending the framework of Blais, Brody, and Matulef [15], we show a simple way to reduce (private-coin) SMP problems to distribution testing problems. This method allows us to prove new distribution testing lower bounds, as well as to provide simple proofs of known lower bounds. Our main result is concerned with testing identity to a specific distribution, p, given as a parameter. In a recent and influential work, Valiant and Valiant [55] showed that the sample complexity of the aforementioned problem is closely related to the ℓ2/3-quasinorm of p. We obtain alternative bounds on the complexity of this problem in terms of an arguably more intuitive measure and using simpler proofs. More specifically, we prove that the sample complexity is essentially determined by a fundamental operator in the theory of interpolation of Banach spaces, known as Peetre’s K-functional. We show that this quantity is closely related to the size of the effective support of p (loosely speaking, the number of supported elements that constitute the vast majority of the mass of p). This result, in turn, stems from an unexpected connection to functional analysis and refined concentration of measure inequalities, which arise naturally in our reduction.

[1]  Doctoral Thesis,et al.  Instituto de Matematica Pura e Aplicada , 2009 .

[2]  Ronitt Rubinfeld,et al.  Learning and Testing Junta Distributions , 2016, COLT.

[3]  Ilias Diakonikolas,et al.  Sample-Optimal Identity Testing with High Probability , 2017, Electron. Colloquium Comput. Complex..

[4]  Daniel M. Kane,et al.  Testing Identity of Structured Distributions , 2014, SODA.

[5]  Ilan Newman,et al.  Property Testing of Massively Parametrized Problems - A Survey , 2010, Property Testing.

[6]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[7]  Clément L. Canonne Are Few Bins Enough: Testing Histogram Distributions , 2016, PODS.

[8]  Optimal Algorithms , 1989, Lecture Notes in Computer Science.

[9]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[10]  Gregory Valiant,et al.  A CLT and tight lower bounds for estimating entropy , 2010, Electron. Colloquium Comput. Complex..

[11]  Dana Ron,et al.  Strong Lower Bounds for Approximating Distribution Support Size and the Distinct Elements Problem , 2009, SIAM J. Comput..

[12]  Tord Holmstedt,et al.  Interpolation of Quasi-Normed Spaces. , 1970 .

[13]  Seshadhri Comandur,et al.  Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[14]  Ronitt Rubinfeld,et al.  Testing monotonicity of distributions over general partial orders , 2011, ICS.

[15]  Gregory Valiant,et al.  Estimating the unseen: A sublinear-sample canonical estimator of distributions , 2010, Electron. Colloquium Comput. Complex..

[16]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[17]  C. Bennett,et al.  Interpolation of operators , 1987 .

[18]  Constantinos Daskalakis,et al.  Testing Poisson Binomial Distributions , 2014, SODA.

[19]  Gregory Valiant,et al.  Testing Closeness With Unequal Sized Samples , 2015, NIPS.

[20]  Pawel Hitczenko,et al.  On the Rademacher Series , 1994 .

[21]  Alon Orlitsky,et al.  Competitive Closeness Testing , 2011, COLT.

[22]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[23]  Yoram Sagher Review: Colin Bennett and Robert Sharpley, Interpolation of operators , 1990 .

[24]  Clément L. Canonne,et al.  A Chasm Between Identity and Equivalence Testing with Conditional Queries , 2014, APPROX-RANDOM.

[25]  Oded Goldreich The uniform distribution is complete with respect to testing identity to a fixed distribution , 2016, Electron. Colloquium Comput. Complex..

[26]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[27]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[28]  Joshua Brody,et al.  Lower Bounds for Testing Computability by Small Width OBDDs , 2011, TAMC.

[29]  Tsachy Weissman,et al.  Order-Optimal Estimation of Functionals of Discrete Distributions , 2014, ArXiv.

[30]  Rocco A. Servedio,et al.  Testing probability distributions using conditional samples , 2012, Electron. Colloquium Comput. Complex..

[31]  Ronitt Rubinfeld Taming big probability distributions , 2012, XRDS.

[32]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[33]  Sergey V. Astashkin,et al.  Rademacher functions in symmetric spaces , 2010 .

[34]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[35]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[36]  Daniel M. Kane,et al.  Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[37]  Ronitt Rubinfeld,et al.  Approximating and testing k-histogram distributions in sub-linear time , 2012, PODS '12.

[38]  Clément L. Canonne Big Data on the Rise? - Testing Monotonicity of Distributions , 2015, ICALP.

[39]  R. Servedio,et al.  Testing monotone high-dimensional distributions , 2009 .

[40]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[41]  S. Montgomery-Smith The distribution of Rademacher sums , 1990 .

[42]  Alon Orlitsky,et al.  Faster Algorithms for Testing under Conditional Sampling , 2015, COLT.

[43]  Ronitt Rubinfeld,et al.  Robust Characterizations of Polynomials with Applications to Program Testing , 1996, SIAM J. Comput..

[44]  Dana Ron,et al.  On Testing Expansion in Bounded-Degree Graphs , 2000, Studies in Complexity and Cryptography.

[45]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[46]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[47]  Rocco A. Servedio,et al.  Testing k-Modal Distributions: Optimal Algorithms via Reductions , 2011, SODA.

[48]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[49]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[50]  Eldar Fischer,et al.  On the power of conditional samples in distribution testing , 2013, ITCS '13.

[51]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[52]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[53]  E. Fischer,et al.  Improving and Extending the Testing of Distributions for Shape-Restricted Properties , 2019, Algorithmica.

[54]  Ilan Newman,et al.  Public vs. private coin flips in one round communication games (extended abstract) , 1996, STOC '96.

[55]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[56]  Ronitt Rubinfeld,et al.  The complexity of approximating entropy , 2002, STOC '02.

[57]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[58]  Gregory Valiant,et al.  The Power of Linear Estimators , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[59]  Joshua Brody,et al.  Property Testing Lower Bounds via Communication Complexity , 2011, 2011 IEEE 26th Annual Conference on Computational Complexity.

[60]  O. Vorobyev,et al.  Discrete multivariate distributions , 2008, 0811.0406.