Testing Similar Means

We consider the problem of testing a basic property of collections of distributions: having similar means. Namely, the algorithm should accept collections of distributions in which all distributions have means that do not differ by more than some given parameter, and should reject collections that are relatively far from having this property. By ‘far' we mean that it is necessary to modify the distributions in a relatively significant manner (measured according to the l1 distance averaged over the distributions) so as to obtain the property. We study this problem in two models. In the first model (the query model) the algorithm may ask for samples from any distribution of its choice, and in the second model (the sampling model) the distributions from which it gets samples are selected randomly. We provide upper and lower bounds in both models. In particular, in the query model, the complexity of the problem is polynomial in 1/e (where e is the given distance parameter). While in the sampling model, the complexity grows roughly as m1−poly(e), where m is the number of distributions.

[1]  Ronitt Rubinfeld,et al.  Robust Characterizations of Polynomials with Applications to Program Testing , 1996, SIAM J. Comput..

[2]  GoldreichOded,et al.  Property testing and its connection to learning and approximation , 1998 .

[3]  William Mendenhall,et al.  Introduction to Probability and Statistics , 1961, The Mathematical Gazette.

[4]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[5]  Dana Ron,et al.  Property Testing in Bounded Degree Graphs , 2002, STOC '97.

[6]  Ronitt Rubinfeld,et al.  The complexity of approximating entropy , 2002, STOC '02.

[7]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[8]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[9]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[10]  W. J. DeCoursey,et al.  Introduction: Probability and Statistics , 2003 .

[11]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[12]  William Mendenhall,et al.  Introduction to Probability and Statistics , 1968 .

[13]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[14]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[15]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.

[16]  Sudipto Guha,et al.  Sublinear estimation of entropy and information distances , 2009, TALG.

[17]  Dana Ron,et al.  On Testing Expansion in Bounded-Degree Graphs , 2000, Studies in Complexity and Cryptography.

[18]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[19]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[20]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[21]  Ronitt Rubinfeld,et al.  The complexity of approximating the entropy , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[22]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[23]  Dana Ron,et al.  Strong Lower Bounds for Approximating Distribution Support Size and the Distinct Elements Problem , 2009, SIAM J. Comput..

[24]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[25]  R. Larsen,et al.  An introduction to mathematical statistics and its applications (2nd edition) , by R. J. Larsen and M. L. Marx. Pp 630. £17·95. 1987. ISBN 13-487166-9 (Prentice-Hall) , 1987, The Mathematical Gazette.