Testing Probability Distributions Underlying Aggregated Data

In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution D over [n]. More precisely, we consider both the dual and cumulative dual access models, in which the algorithm A can both sample from D and respectively, for any i ∈ [n], query the probability mass D(i) (query access); or get the total mass of {1,…,i}, i.e. \(\sum_{j=1}^i D(j)\) (cumulative access)

[1]  Gregory Valiant,et al.  Estimating the unseen: A sublinear-sample canonical estimator of distributions , 2010, Electron. Colloquium Comput. Complex..

[2]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[3]  C. Papadimitriou,et al.  The complexity of massive data set computations , 2002 .

[4]  Eldar Fischer,et al.  On the power of conditional samples in distribution testing , 2013, ITCS '13.

[5]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[6]  Rocco A. Servedio,et al.  Testing k-Modal Distributions: Optimal Algorithms via Reductions , 2011, SODA.

[7]  Ronitt Rubinfeld,et al.  Approximating and testing k-histogram distributions in sub-linear time , 2012, PODS '12.

[8]  Seshadhri Comandur,et al.  Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[9]  Ronitt Rubinfeld,et al.  Tolerant property testing and distance approximation , 2006, J. Comput. Syst. Sci..

[10]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[11]  Zhengmin Zhang,et al.  Estimating Mutual Information Via Kolmogorov Distance , 2007, IEEE Transactions on Information Theory.

[12]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[13]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[14]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[15]  Ronitt Rubinfeld,et al.  The complexity of approximating entropy , 2002, STOC '02.

[16]  Sudipto Guha,et al.  Streaming and sublinear approximation of entropy and information distances , 2005, SODA '06.

[17]  L. Birge On the Risk of Histograms for Estimating Decreasing Densities , 1987 .

[18]  Shang‐keng Ma Calculation of entropy from data of motion , 1981 .

[19]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[21]  Rocco A. Servedio,et al.  Testing monotone high-dimensional distributions , 2005, STOC '05.

[22]  Rocco A. Servedio,et al.  Testing equivalence between distributions using conditional samples , 2014, SODA.

[23]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[24]  Rocco A. Servedio,et al.  Learning Poisson Binomial Distributions , 2011, STOC '12.

[25]  Ryan O'Donnell,et al.  Learning Sums of Independent Integer Random Variables , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[26]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[27]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[28]  Gregory Valiant,et al.  A CLT and tight lower bounds for estimating entropy , 2010, Electron. Colloquium Comput. Complex..

[29]  Rocco A. Servedio,et al.  Testing probability distributions using conditional samples , 2012, Electron. Colloquium Comput. Complex..

[30]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[31]  J. Adell,et al.  Exact Kolmogorov and total variation distances between some familiar discrete distributions , 2006 .