论文信息 - Telling Two Distributions Apart: a Tight Characterization

Telling Two Distributions Apart: a Tight Characterization

We consider the problem of distinguishing between two arbitrary black-box distributions defined over the domain [n], given access to $s$ samples from both. It is known that in the worst case O(n^{2/3}) samples is both necessary and sufficient, provided that the distributions have L1 difference of at least {\epsilon}. However, it is also known that in many cases fewer samples suffice. We identify a new parameter, that provides an upper bound on how many samples needed, and present an efficient algorithm that requires the number of samples independent of the domain size. Also for a large subclass of distributions we provide a lower bound, that matches our upper bound up to a poly-logarithmic factor.

Mark Sandler | Eyal Even-Dar | Eyal Even-Dar | M. Sandler

[1] Dana Ron,et al. Property testing and its connection to learning and approximation , 1998, JACM.

[2] Ronitt Rubinfeld,et al. Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3] Ronitt Rubinfeld,et al. Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[4] Paul Valiant. Testing symmetric properties of distributions , 2008, STOC '08.

[5] Ronitt Rubinfeld,et al. Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[6] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[8] Shai Ben-David,et al. Detecting Change in Data Streams , 2004, VLDB.

[9] Ronitt Rubinfeld,et al. The complexity of approximating the entropy , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.