(MATH) We consider the problem of approximating the entropy of a discrete distribution under several models. If the distribution is given explicitly as an array where the i-th location is the probability of the i-th element, then linear time is both necessary and sufficient for approximating the entropy.We consider a model in which the algorithm is given access only to independent samples from the distribution. Here, we show that a λ-multiplicative approximation to the entropy can be obtained in O(n(1+η)/λ2 < poly(log n)) time for distributions with entropy Ω(λ η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that one cannot get a multiplicative approximation to the entropy in general in this model. Even for the class of distributions to which our upper bound applies, we obtain a lower bound of Ω(nmax(1/(2λ2), 2/(5λ2—2)).We next consider a hybrid model in which both the explicit distribution as well as independent samples are available. Here, significantly more efficient algorithms can be achieved: a λ-multiplicative approximation to the entropy can be obtained in O(λ2.Finally, we consider two special families of distributions: those for which the probability of an element decreases monotonically in the label of the element, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.
[1]
Ronitt Rubinfeld,et al.
Testing that distributions are close
,
2000,
Proceedings 41st Annual Symposium on Foundations of Computer Science.
[2]
William Bialek,et al.
Entropy and Information in Neural Spike Trains
,
1996,
cond-mat/9603127.
[3]
David R. Wolf,et al.
Estimating functions of probability distributions from a finite set of samples.
,
1994,
Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[4]
Shang‐keng Ma.
Calculation of entropy from data of motion
,
1981
.
[5]
Ronitt Rubinfeld,et al.
Testing random variables for independence and identity
,
2001,
Proceedings 2001 IEEE International Conference on Cluster Computing.
[6]
Oded Goldreich,et al.
Comparing entropies in statistical zero knowledge with applications to the structure of SZK
,
1999,
Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).
[7]
B. Harris.
The Statistical Estimation of Entropy in the Non-Parametric Case
,
1975
.