Hilberg Exponents: New Measures of Long Memory in the Process

The paper concerns the rates of hyperbolic growth of mutual information computed for a stationary measure or for a universal code. The rates are called Hilberg exponents and four such quantities are defined for each measure and each code: two random exponents and two expected exponents. A particularly interesting case arises for conditional algorithmic mutual information. In this case, the random Hilberg exponents are almost surely constant on ergodic sources and are bounded by the expected Hilberg exponents. This property is a “second-order” analogue of the Shannon-McMillan-Breiman theorem, proved without invoking the ergodic theorem. It carries over to Hilberg exponents for the underlying probability measure via Shannon-Fano coding and Barron inequality. Moreover, the expected Hilberg exponents can be linked for different universal codes. Namely, if one code dominates another, the expected Hilberg exponents are greater for the former than for the latter. The paper is concluded by an evaluation of Hilberg exponents for certain sources such as the Bayesian Bernoulli process and the Santa Fe processes.

[1]  W. Hilberg,et al.  Der bekannte Grenzwert der redundanzfreien Information in Texten - eine Fehlinterpretation der Shannonschen Experimente? , 1990 .

[2]  Lukasz Debowski,et al.  A New Universal Code Helps to Distinguish Natural Language from Random Texts , 2015, Recent Contributions to Quantitative Linguistics.

[3]  Tsachy Weissman Not All Universal Source Codes Are Pointwise Universal , 2004 .

[4]  L. Debowski,et al.  Empirical Evidence for Hilberg ’ s Conjecture in Single-Author Texts , 2012 .

[5]  Robert M. Gray,et al.  The ergodic decomposition of stationary discrete random processes , 1974, IEEE Trans. Inf. Theory.

[6]  Lei Li,et al.  Iterated logarithmic expansions of the pathwise code lengths for exponential families , 2000, IEEE Trans. Inf. Theory.

[7]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[8]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[9]  T. Cover,et al.  A sandwich proof of the Shannon-McMillan-Breiman theorem , 1988 .

[10]  A. Barron THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[11]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[12]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[13]  L. Breiman The Individual Ergodic Theorem of Information Theory , 1957 .

[14]  Gregory J. Chaitin,et al.  A recent technical report , 1974, SIGA.

[15]  Lukasz Debowski,et al.  On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts , 2008, IEEE Transactions on Information Theory.

[16]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[17]  Péter Gács,et al.  Algorithmic statistics , 2000, IEEE Trans. Inf. Theory.

[18]  K. Chung A Note on the Ergodic Theorem of Information Theory , 1961 .

[19]  Kevin Atteson,et al.  The asymptotic redundancy of Bayes rules for Markov chains , 1999, IEEE Trans. Inf. Theory.

[20]  Lukasz Debowski,et al.  Mixing, Ergodic, and Nonergodic Processes With Rapidly Growing Information Between Blocks , 2011, IEEE Transactions on Information Theory.

[21]  Guy Louchard,et al.  Average redundancy rate of the Lempel-Ziv code , 1996, Proceedings of Data Compression Conference - DCC '96.

[22]  A. Brudno Entropy and the complexity of the trajectories of a dynamical system , 1978 .

[23]  John C. Kieffer,et al.  A unified approach to weak universal source coding , 1978, IEEE Trans. Inf. Theory.