Minimax Estimation of Discrete Distributions Under $\ell _{1}$ Loss

We consider the problem of discrete distribution estimation under l1 loss. We provide tight upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the support size S may grow with the number of observations n. We show that among distributions with bounded entropy H, the asymptotic maximum risk for the empirical distribution is 2H/ln n, while the asymptotic minimax risk is H/ ln n. Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound H, is essentially minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again 2H/ ln n. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (I1 divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation.

[1]  J. F. Daly A Problem in Estimation , 1941 .

[2]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[3]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[4]  C. Stein Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution , 1956 .

[5]  S. Trybuła Some Problems of Simultaneous Minimax Estimation , 1958 .

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  J. Hájek Local asymptotic minimax and admissibility in estimation , 1972 .

[8]  M. Rutkowska Minimax estimation of the parameters of the multivariate hypergeometric and multinomial distributions , 1977 .

[9]  I. Olkin,et al.  Admissible and Minimax Estimation for the Multinomial Distribution and for K Independent Binomial Distributions , 1979 .

[10]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[11]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[12]  T. Cover,et al.  A sandwich proof of the Shannon-McMillan-Breiman theorem , 1988 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  P. Diaconis,et al.  Closed Form Summation for Classical Distributions: Variations on Theme of De Moivre , 1991 .

[15]  K. Marton,et al.  Entropy and the Consistent Estimation of Joint Distributions , 1993, Proceedings. IEEE International Symposium on Information Theory.

[16]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[17]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[18]  I. Johnstone,et al.  Minimax risk overlp-balls forlp-error , 1994 .

[19]  I. Johnstone,et al.  Minimax Risk over l p-Balls for l q-error , 1994 .

[20]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[21]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[22]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[23]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[24]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  J. Adell,et al.  Exact Kolmogorov and total variation distances between some familiar discrete distributions , 2006 .

[27]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[28]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[29]  Gregory Valiant,et al.  The Power of Linear Estimators , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[30]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[31]  T. Cai,et al.  Minimax and Adaptive Inference in Nonparametric Function Estimation , 2012, 1203.4911.

[32]  Jorge F. Silva,et al.  Shannon entropy convergence results in the countable infinite case , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[33]  D. Berend,et al.  A sharp estimate of the binomial mean absolute deviation with applications , 2013 .

[34]  Rocco A. Servedio,et al.  Learning k-Modal Distributions via Testing , 2012, Theory Comput..

[35]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[36]  Yanjun Han,et al.  Beyond Maximum Likelihood: from Theory to Practice , 2014, ArXiv.

[37]  T. Weissman,et al.  Non-asymptotic Theory for the Plug-in Rule in Functional Estimation , 2014 .

[38]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[39]  Alon Orlitsky,et al.  On Learning Distributions from their Samples , 2015, COLT.

[40]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[41]  Yanjun Han,et al.  Maximum Likelihood Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[42]  D. Donoho,et al.  Minimax risk over / p-balls for / q-error , 2022 .