The complexity of approximating entropy

(MATH) We consider the problem of approximating the entropy of a discrete distribution under several models. If the distribution is given explicitly as an array where the i-th location is the probability of the i-th element, then linear time is both necessary and sufficient for approximating the entropy.We consider a model in which the algorithm is given access only to independent samples from the distribution. Here, we show that a λ-multiplicative approximation to the entropy can be obtained in O(n(1+η)/λ2 < poly(log n)) time for distributions with entropy Ω(λ η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that one cannot get a multiplicative approximation to the entropy in general in this model. Even for the class of distributions to which our upper bound applies, we obtain a lower bound of Ω(nmax(1/(2λ2), 2/(5λ2—2)).We next consider a hybrid model in which both the explicit distribution as well as independent samples are available. Here, significantly more efficient algorithms can be achieved: a λ-multiplicative approximation to the entropy can be obtained in O(λ2.Finally, we consider two special families of distributions: those for which the probability of an element decreases monotonically in the label of the element, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.

[1]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[2]  William Bialek,et al.  Entropy and Information in Neural Spike Trains , 1996, cond-mat/9603127.

[3]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[4]  Shang‐keng Ma Calculation of entropy from data of motion , 1981 .

[5]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[6]  Oded Goldreich,et al.  Comparing entropies in statistical zero knowledge with applications to the structure of SZK , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[7]  B. Harris The Statistical Estimation of Entropy in the Non-Parametric Case , 1975 .