Consistency of the plug-in estimator of the entropy rate for ergodic processes

A plug-in estimator of entropy is the entropy of the distribution where probabilities of symbols or blocks have been replaced with their relative frequencies in the sample. Consistency and asymptotic unbiasedness of the plug-in estimator can be easily demonstrated in the IID case. In this paper, we ask whether the plug-in estimator can be used for consistent estimation of the entropy rate h of a stationary ergodic process. The answer is positive if, to estimate block entropy of order k, we use a sample longer than 2k(h+ϵ), whereas it is negative if we use a sample shorter than 2k(h-ϵ). In particular, if we do not know the entropy rate h, it is sufficient to use a sample of length (|X| + ϵ)k where |X| is the alphabet size. The result is derived using k-block coding. As a by-product of our technique, we also show that the block entropy of a stationary process is bounded above by a nonlinear function of the average block entropy of its ergodic components. This inequality can be used for an alternative proof of the known fact that the entropy rate a stationary process equals the average entropy rate of its ergodic components.

[1]  Robert M. Gray,et al.  The ergodic decomposition of stationary discrete random processes , 1974, IEEE Trans. Inf. Theory.

[2]  Lukasz Debowski,et al.  Mixing, Ergodic, and Nonergodic Processes With Rapidly Growing Information Between Blocks , 2011, IEEE Transactions on Information Theory.

[3]  Boris Ryabko,et al.  Applications of Universal Source Coding to Statistical Analysis of Time Series , 2008, ArXiv.

[4]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[5]  T. Cover,et al.  A sandwich proof of the Shannon-McMillan-Breiman theorem , 1988 .

[6]  K. Marton,et al.  Entropy and the Consistent Estimation of Joint Distributions , 1993, Proceedings. IEEE International Symposium on Information Theory.

[7]  Zhiyi Zhang,et al.  Entropy Estimation in Turing's Perspective , 2012, Neural Computation.

[8]  Łukasz Dębowski,et al.  Regular Hilberg Processes: An Example of Processes With a Vanishing Entropy Rate , 2017, IEEE Transactions on Information Theory.

[9]  Benjamin Weiss,et al.  How Sampling Reveals a Process , 1990 .

[10]  Xing Zhang,et al.  A Normal Law for the Plug-in Estimator of Entropy , 2012, IEEE Transactions on Information Theory.

[11]  Yanjun Han,et al.  Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.

[12]  Lukasz Debowski,et al.  Estimation of Entropy from Subword Complexity , 2016, Challenges in Computational Statistics and Data Mining.

[13]  David L. Neuhoff,et al.  Simplistic Universal Coding. , 1998, IEEE Trans. Inf. Theory.