Minimax Optimal Additive Functional Estimation with Discrete Distribution: Slow Divergence Speed Case

This paper addresses a problem of estimating an additive functional given <tex>$n$</tex> i.i.d. samples drawn from a discrete distribution <tex>$P=(p_{1},\ \ldots,p_{k})$</tex> with alphabet size <tex>$k$</tex>. The additive functional is defined as <tex>${\theta}(P;\phi)={\sum}_{i=1}^{k}\phi(p_{i})$</tex> for a function <tex>$\phi$</tex>, which covers the most of the entropy-like criteria. We revealed in the previous paper [1] that the minimax optimal rate of this problem is characterized by the divergence speed, whereas the characterization is valid only when <tex>$\alpha\in(0,1)$</tex> where <tex>$\alpha$</tex> denotes the parameter of the divergence speed. In this paper, we extend this characterization to a more general range of the divergence speed, including <tex>$\alpha\in(1,3/2)$</tex> and <tex>$\alpha\in[3/2,2]$</tex>. As a result, we show that the minimax rates for <tex>$\alpha\in(1,3/2)$</tex> and <tex>$\alpha\in[3/2,2]$</tex> are <tex>$\frac{1}{n}+\frac{k^{2}}{(n\ln n)^{2\alpha}}$</tex> and <tex>$\frac{1}{n}$</tex>, respectively.

[1]  T. Cai,et al.  Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional , 2011, 1105.3039.

[2]  Himanshu Tyagi,et al.  The Complexity of Estimating Rényi Entropy , 2015, SODA.

[3]  Yihong Wu,et al.  Chebyshev polynomials, moment matching, and optimal estimation of the unseen , 2015, The Annals of Statistics.

[4]  Yanjun Han,et al.  Minimax estimation of the L1 distance , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[5]  Z. Ditzian,et al.  Direct estimate for Bernstein polynomials , 1994 .

[6]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[7]  Yanjun Han,et al.  Maximum Likelihood Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[8]  Yanjun Han,et al.  Minimax rate-optimal estimation of KL divergence between discrete distributions , 2016, 2016 International Symposium on Information Theory and Its Applications (ISITA).

[9]  A. Suresh,et al.  The Complexity of Estimating R\'enyi Entropy , 2014 .

[10]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[11]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Ga Miller,et al.  Note on the bias of information estimates , 1955 .

[13]  V. Totik,et al.  Moduli of smoothness , 1987 .

[14]  P. Petrushev,et al.  Rational Approximation of Real Functions , 1988 .

[15]  S. Zahl,et al.  JACKKNIFING AN INDEX OF DIVERSITY , 1977 .

[16]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[17]  L. L. Cam,et al.  Asymptotic methods in statistical theory , 1986 .

[18]  P. Grassberger Finite sample corrections to entropy and dimension estimates , 1988 .

[19]  Yihong Wu,et al.  Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.

[20]  Donald F. Towsley,et al.  Detecting anomalies in network traffic using maximum entropy estimation , 2005, IMC '05.

[21]  Steffen Schober,et al.  Some worst-case bounds for Bayesian estimators of discrete distributions , 2013, 2013 IEEE International Symposium on Information Theory.

[22]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[23]  Yanjun Han,et al.  Does dirichlet prior smoothing solve the Shannon entropy estimation problem? , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  J. R. Moorman,et al.  Accurate estimation of entropy in very short physiological time series: the problem of atrial fibrillation detection in implanted ventricular devices. , 2011, American journal of physiology. Heart and circulatory physiology.

[26]  Yingbin Liang,et al.  Estimation of KL divergence between large-alphabet distributions , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[27]  A. Spitzbart A Generalization of Hermite's Interpolation Formula , 1960 .

[28]  Maciej Skorski On the complexity of estimating Rènyi divergences , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[29]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[30]  Gregory Valiant,et al.  The Power of Linear Estimators , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[31]  Gregory Valiant,et al.  Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.

[32]  Jun Sakuma,et al.  Minimax optimal estimators for additive scalar functionals of discrete distributions , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[33]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  A. Timan Theory of Approximation of Functions of a Real Variable , 1994 .

[35]  J. Cooper,et al.  Theory of Approximation , 1960, Mathematical Gazette.

[36]  D. Holste,et al.  Bayes' estimators of generalized entropies , 1998 .