Testing Identity of Structured Distributions

We study the question of identity testing for structured distributions. More precisely, given samples from a {\em structured} distribution $q$ over $[n]$ and an explicit distribution $p$ over $[n]$, we wish to distinguish whether $q=p$ versus $q$ is at least $\epsilon$-far from $p$, in $L_1$ distance. In this work, we present a unified approach that yields new, simple testers, with sample complexity that is information-theoretically optimal, for broad classes of structured distributions, including $t$-flat distributions, $t$-modal distributions, log-concave distributions, monotone hazard rate (MHR) distributions, and mixtures thereof.

[1]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[2]  Anne-Laure Fougères,et al.  Estimation de densités unimodales , 1997 .

[3]  J. Wellner,et al.  Estimation of a k-monotone density: limit distribution theory and the Spline connection , 2005, math/0509081.

[4]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[5]  J. Wellner,et al.  On the rate of convergence of the maximum likelihood estimator of a k-monotone density , 2009 .

[6]  Rocco A. Servedio,et al.  Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[7]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[8]  Ronitt Rubinfeld,et al.  Approximating and testing k-histogram distributions in sub-linear time , 2012, PODS '12.

[9]  H. D. Brunk On the Estimation of Parameters Restricted by Inequalities , 1958 .

[10]  Mario Mateo,et al.  Estimating dark matter distributions , 2005 .

[11]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[12]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[13]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[14]  A. U.S.,et al.  Testing for multimodality with dependent data , 2004 .

[15]  Ronitt Rubinfeld,et al.  Robust Characterizations of Polynomials with Applications to Program Testing , 1996, SIAM J. Comput..

[16]  H. Barnett A Theory of Mortality , 1968 .

[17]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[18]  Ryan O'Donnell,et al.  Learning Sums of Independent Integer Random Variables , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[19]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[20]  U. Grenander On the theory of mortality measurement , 1956 .

[21]  L. Duembgen,et al.  Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency , 2007, 0709.0334.

[22]  J. Kalbfleisch Statistical Inference Under Order Restrictions , 1975 .

[23]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[24]  Seshadhri Comandur,et al.  Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[25]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[26]  Rocco A. Servedio,et al.  Testing k-Modal Distributions: Optimal Algorithms via Reductions , 2011, SODA.

[27]  Rocco A. Servedio,et al.  Learning k-Modal Distributions via Testing , 2012, Theory Comput..

[28]  L. Birge Estimating a Density under Order Restrictions: Nonasymptotic Minimax Risk , 1987 .

[29]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[30]  J. Wellner,et al.  Limit Distribution Theory for Maximum Likelihood Estimation of a Log-Concave Density. , 2007, Annals of statistics.

[31]  Tugkan Batu Testing Properties of Distributions , 2001 .

[32]  Ronitt Rubinfeld Taming big probability distributions , 2012, XRDS.

[33]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[34]  Prakasa Rao Estimation of a unimodal density , 1969 .

[35]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[36]  Ronitt Rubinfeld,et al.  The complexity of approximating entropy , 2002, STOC '02.

[37]  D. L. Hanson,et al.  Consistency in Concave Regression , 1976 .

[38]  L. Birge On the Risk of Histograms for Estimating Decreasing Densities , 1987 .

[39]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[40]  Colin L. Mallows,et al.  Design, data & analysis , 1987 .

[41]  P. Groeneboom Estimating a monotone density , 1984 .

[42]  G. Walther Inference and Modeling with Log-concave Distributions , 2009, 1010.0305.

[43]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[44]  Jon A Wellner,et al.  Estimation of a discrete monotone distribution. , 2009, Electronic journal of statistics.

[45]  L. Reboul,et al.  Estimation of a function under shape restrictions. Applications to reliability , 2005, math/0507427.

[46]  Fadoua Balabdaoui,et al.  Limit distribution theory for maximum likelihood estimation of a log-concave density , 2009 .

[47]  Frank R. Hampel,et al.  Design, modelling, and analysis of some biological data sets , 1987 .

[48]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[49]  Fadoua Balabdaoui,et al.  Estimation of a k‐monotone density: characterizations, consistency and minimax lower bounds , 2010, Statistica Neerlandica.

[50]  E. Wegman Maximum likelihood estimation of a unimodal density. II , 1970 .

[51]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[52]  Alon Orlitsky,et al.  Competitive Closeness Testing , 2011, COLT.

[53]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[54]  J. Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[55]  Jon Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, FOCS.

[56]  R. Koenker,et al.  QUASI-CONCAVE DENSITY ESTIMATION , 2010, 1007.4013.

[57]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[58]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[59]  Rocco A. Servedio,et al.  Learning Poisson Binomial Distributions , 2011, STOC '12.