Testing symmetric properties of distributions

We introduce the notion of a Canonical Tester for a class of properties on distributions, that is, a tester strong and general enough that "a distribution property in the class is testable if and only if the Canonical Tester tests it". We construct a Canonical Tester for the class of symmetric properties of one or two distributions, satisfying a certain weak continuity condition. Analyzing the performance of the Canonical Tester on specific properties resolves several open problems, establishing lower bounds that match known upper bounds: we show that distinguishing between entropy <α or >β on distributions over [n] requires nα/β- o(1) samples, and distinguishing whether a pair of distributions has statistical distance <α or >β requires n1-o(1) samples. Our techniques also resolve a conjecture about a property that our Canonical Tester does not apply to: distinguishing identical distributions from those with statistical distance >β requires Ω(n2/3) samples.

[1]  Ronitt Rubinfeld,et al.  The complexity of approximating the entropy , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[2]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[3]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[5]  A. Klinger THE VANDERMONDE MATRIX , 1967 .

[6]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[7]  Manuel Blum,et al.  Self-testing/correcting with applications to numerical problems , 1990, STOC '90.

[8]  Dana Ron,et al.  Strong Lower Bounds for Approximating Distribution Support Size and the Distinct Elements Problem , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[9]  Piotr Indyk,et al.  Declaring independence via the sketching of sketches , 2008, SODA '08.

[10]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.

[11]  B. Roos On the Rate of Multivariate Poisson Convergence , 1999 .

[12]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[13]  Sudipto Guha,et al.  Streaming and sublinear approximation of entropy and information distances , 2005, SODA '06.

[14]  Rajeev Motwani,et al.  Towards estimation error guarantees for distinct values , 2000, PODS.

[15]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[16]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[17]  Tugkan Batu Testing Properties of Distributions , 2001 .

[18]  Ronitt Rubinfeld,et al.  Robust Characterizations of Polynomials with Applications to Program Testing , 1996, SIAM J. Comput..

[19]  Luca Trevisan,et al.  Three Theorems regarding Testing Graph Properties , 2001, Electron. Colloquium Comput. Complex..

[20]  Ronitt Rubinfeld,et al.  Sublinear Algorithms for Approximating String Compressibility and the Distribution Support Size , 2005, Electron. Colloquium Comput. Complex..

[21]  Manuel Blum,et al.  Designing programs that check their work , 1989, STOC '89.

[22]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[23]  Noga Alon,et al.  A combinatorial characterization of the testable graph properties: it's all about regularity , 2006, STOC '06.