Ergodic decomposition of excess entropy and conditional mutual information

The article discusses excess entropy defined as mutual information between the past and future of a stationary process. The central result is an ergodic decomposition: Excess entropy is the sum of self-information of shift-invariant σ-field and the average of excess entropies for the ergodic components of the process. The result is derived using generalized conditional mutual information for fields of events, developed in the paper anew. Some corollary of the ergodic decomposition is that excess entropy is infinite for the class of processes with uncountably many ergodic components, called here uncountable description processes (UDP’s). UDP’s can be defined without the use of measure theory and the article argues for their potential utility in linguistics. Moreover, it is shown that finite-order excess entropies (some approximations of excess entropy) are dominated by the expected excess lengths of any universal code. Hence, universal codes may be used for rough estimation of excess entropy. Nevertheless, the excess code lengths diverge to infinity for almost every process with zero excess entropy, which is another corollary of the ergodic decomposition.

[1]  Joseph B. Kadane,et al.  Improper regular conditional distributions , 2001 .

[2]  W. Hilberg,et al.  Der bekannte Grenzwert der redundanzfreien Information in Texten - eine Fehlinterpretation der Shannonschen Experimente? , 1990 .

[3]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[4]  Abhi Shelat,et al.  The smallest grammar problem , 2005, IEEE Transactions on Information Theory.

[5]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[6]  Valérie Berthé,et al.  Conditional entropy of some automatic sequences , 1994 .

[7]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[8]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[9]  J. R. M. Hosking,et al.  FRACTIONAL DIFFERENCING MODELING IN HYDROLOGY , 1985 .

[10]  W. Ebeling,et al.  Entropy and Long-Range Correlations in Literary English , 1993, cond-mat/0204108.

[11]  P. Spreij Probability and Measure , 1996 .

[12]  Paul M. B. Vitányi,et al.  Shannon Information and Kolmogorov Complexity , 2004, ArXiv.

[13]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[14]  Łukasz De¸bowski,et al.  On Hilberg's law and its links with Guiraud's law* , 2006, Journal of Quantitative Linguistics.

[15]  James Durbin,et al.  The fitting of time series models , 1960 .

[16]  Naftali Tishby,et al.  Complexity through nonextensivity , 2001, physics/0103076.

[17]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[18]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[19]  László Györfi,et al.  There is no universal source code for an infinite source alphabet , 1994, IEEE Trans. Inf. Theory.

[20]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.

[21]  Rodney G. Downey,et al.  Some Recent Progress in Algorithmic Randomness , 2004, MFCS.

[22]  Hans van Halteren,et al.  New Machine Learning Methods Demonstrate the Existence of a Human Stylome , 2005, J. Quant. Linguistics.

[23]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[24]  Fred L. Ramsey,et al.  Characterization of the Partial Autocorrelation Function , 1974 .

[25]  I. Good,et al.  Ergodic theory and information , 1966 .

[26]  P. Gács,et al.  KOLMOGOROV'S CONTRIBUTIONS TO INFORMATION THEORY AND ALGORITHMIC COMPLEXITY , 1989 .

[27]  Tsachy Weissman Not All Universal Source Codes Are Pointwise Universal , 2004 .

[28]  Paul M. B. Vitányi,et al.  Kolmogorov Complexity and Information Theory. With an Interpretation in Terms of Questions and Answers , 2003, J. Log. Lang. Inf..

[29]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[30]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[31]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[32]  Raymond W. Yeung,et al.  A First Course in Information Theory , 2002 .

[33]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[34]  Jan M. Swart,et al.  A conditional product measure theorem , 1996 .

[35]  Gramss Entropy of the symbolic sequence for critical circle maps. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[36]  A. Milosavljevic,et al.  Discovering Dependencies via Algorithmic Mutual Information: A Case Study in DNA Sequence Comparisons , 2004, Machine Learning.

[37]  Nicholas P. Jewell,et al.  Characterizations of completely nondeterministic stochastic processes. , 1983 .

[38]  Werner Ebeling,et al.  Word frequency and entropy of symbolic sequences: a dynamical perspective , 1992 .

[39]  On the strong mixing and weak Bernoulli conditions , 1980 .

[40]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[41]  On the covariance determinants of moving-average and autoregressive models , 1960 .

[42]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[43]  Lukasz Debowski,et al.  Menzerath's law for the smallest grammars , 2007, Exact Methods in the Study of Language and Text.

[44]  Robert M. Gray,et al.  The ergodic decomposition of stationary discrete random processes , 1974, IEEE Trans. Inf. Theory.

[45]  Werner Ebeling,et al.  Entropy of symbolic sequences: the role of correlations , 1991 .

[46]  Robert M. Gray,et al.  Source coding theorems without the ergodic assumption , 1974, IEEE Trans. Inf. Theory.

[47]  U. Grenander,et al.  Toeplitz Forms And Their Applications , 1958 .