Optimal Testing for Properties of Distributions

Given samples from an unknown distribution $p$, is it possible to distinguish whether $p$ belongs to some class of distributions $\mathcal{C}$ versus $p$ being far from every distribution in $\mathcal{C}$? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, and more recently in information theory and theoretical computer science, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of distributions such as monotonicity, log-concavity, unimodality, independence, and monotone-hazard rate, the optimal sample complexity is unknown. We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem: Given samples from an unknown distribution $p$, and a known distribution $q$, are $p$ and $q$ close in $\chi^2$-distance, or far in total variation distance? The optimality of our testers is established by providing matching lower bounds with respect to both $n$ and $\varepsilon$. Finally, a necessary building block for our testers and an important byproduct of our work are the first known computationally efficient proper learners for discrete log-concave and monotone hazard rate distributions.

[1]  J. Wellner,et al.  Log-Concavity and Strong Log-Concavity: a review. , 2014, Statistics surveys.

[2]  Liam Paninski,et al.  A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data , 2008, IEEE Transactions on Information Theory.

[3]  Alon Orlitsky,et al.  25th Annual Conference on Learning Theory Competitive Classification and Closeness Testing , 2022 .

[4]  E. Fischer THE ART OF UNINFORMED DECISIONS: A PRIMER TO PROPERTY TESTING , 2004 .

[5]  L. Birge Estimating a Density under Order Restrictions: Nonasymptotic Minimax Risk , 1987 .

[6]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[7]  Rocco A. Servedio,et al.  Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[8]  Clément L. Canonne,et al.  A Survey on Distribution Testing: Your Data is Big. But is it Blue? , 2020, Electron. Colloquium Comput. Complex..

[9]  Oded Goldreich,et al.  Combinatorial property testing (a survey) , 1997, Randomization Methods in Algorithm Design.

[10]  P. Hall,et al.  Testing for monotone increasing hazard rate , 2005, math/0507419.

[11]  Fadoua Balabdaoui,et al.  Estimation of a k‐monotone density: characterizations, consistency and minimax lower bounds , 2010, Statistica Neerlandica.

[12]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[13]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[14]  Sean P. Meyn,et al.  Generalized Error Exponents for Small Sample Universal Hypothesis Testing , 2012, IEEE Transactions on Information Theory.

[15]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[16]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[17]  Artur Czumaj,et al.  Testing Monotone Continuous Distributions on High-Dimensional Real Cubes , 2010, Property Testing.

[18]  Dana Ron,et al.  Property Testing: A Learning Theory Perspective , 2007, COLT.

[19]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[20]  Jon A Wellner,et al.  Estimation of a discrete monotone distribution. , 2009, Electronic journal of statistics.

[21]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[22]  Artur Czumaj,et al.  Testing monotone continuous distributions on high-dimensional real cubes , 2010, SODA '10.

[23]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[24]  Gregory Valiant,et al.  Testing Closeness With Unequal Sized Samples , 2015, NIPS.

[25]  Constantinos Daskalakis,et al.  Testing Poisson Binomial Distributions , 2014, SODA.

[26]  Ronitt Rubinfeld,et al.  Testing monotonicity of distributions over general partial orders , 2011, ICS.

[27]  Ronitt Rubinfeld,et al.  Testing Properties of Collections of Distributions , 2013, Theory Comput..

[28]  Fadoua Balabdaoui,et al.  Maximum likelihood estimation and confidence bands for a discrete log-concave distribution , 2011 .

[29]  Gregory Valiant,et al.  An Automatic Inequality Prover and Instance Optimal Identity Testing , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[30]  Noga Alon,et al.  Testing k-wise and almost k-wise independence , 2007, STOC '07.

[31]  Ronitt Rubinfeld,et al.  Sublinear Time Algorithms , 2011, SIAM J. Discret. Math..

[32]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[33]  Alon Orlitsky,et al.  Efficient compression of monotone and m-modal distributions , 2014, 2014 IEEE International Symposium on Information Theory.

[34]  Alon Orlitsky,et al.  A Competitive Test for Uniformity of Monotone Distributions , 2013, AISTATS.

[35]  J. Kalbfleisch Statistical Inference Under Order Restrictions , 1975 .

[36]  M. Cule,et al.  Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density , 2009, 0908.4400.

[37]  Ilias Diakonikolas,et al.  Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[38]  A. Scott,et al.  The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables , 1981 .

[39]  Alon Orlitsky,et al.  On Learning Distributions from their Samples , 2015, COLT.

[40]  Ilias Diakonikolas,et al.  Optimal Algorithms for Testing Closeness of Discrete Distributions , 2013, SODA.

[41]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[42]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.