Entropy estimation of symbol sequences.

We discuss algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations. In particular, we consider algorithms which estimate h from the code lengths produced by some compression algorithm. Our interest is in describing their convergence with sequence length, assuming no limits for the space and time complexities of the compression algorithms. A scaling law is proposed for extrapolation from finite sample lengths. This is applied to sequences of dynamical systems in non-trivial chaotic regimes, a 1-D cellular automaton, and to written English texts. (c)1996 American Institute of Physics.

[1]  P. Laplace A Philosophical Essay On Probabilities , 1902 .

[2]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.

[3]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[4]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[5]  Holger Kantz,et al.  Enlarged scaling ranges for the KS-entropy and the information dimension. , 1996, Chaos.

[6]  Peter Grassberger On Efficient Box Counting Algorithms , 1993 .

[7]  W. Ebeling,et al.  A New Method to Calculate Higher-Order Entropies from Finite Samples , 1993 .

[8]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[9]  W. Ebeling,et al.  Finite sample effects in sequence analysis , 1994 .

[10]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[11]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[12]  S. Wolfram Random sequence generation by cellular automata , 1986 .

[13]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[14]  Glen G. Langdon,et al.  A note on the Ziv-Lempel model for compressing individual sequences , 1983, IEEE Trans. Inf. Theory.

[15]  Nadav M. Shnerb,et al.  LANGUAGE AND CODIFICATION DEPENDENCE OF LONG-RANGE CORRELATIONS IN TEXTS , 1994 .

[16]  Y. Pomeau,et al.  Intermittent transition to turbulence in dissipative dynamical systems , 1980 .

[17]  P. Algoet UNIVERSAL SCHEMES FOR PREDICTION, GAMBLING AND PORTFOLIO SELECTION' , 1992 .

[18]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[19]  W. Ebeling,et al.  Entropy and Long-Range Correlations in Literary English , 1993, cond-mat/0204108.

[20]  Peter Grassberger,et al.  Estimating the information content of symbol sequences and efficient codes , 1989, IEEE Trans. Inf. Theory.

[21]  Andrei N. Kolmogorov,et al.  Logical basis for information theory and probability theory , 1968, IEEE Trans. Inf. Theory.

[22]  J. Licklider,et al.  Long-range constraints in the statistical structure of printed English. , 1955, The American journal of psychology.

[23]  P. Grassberger,et al.  Generating partitions for the dissipative Hénon map , 1985 .

[24]  M. Feigenbaum Quantitative universality for a class of nonlinear transformations , 1978 .

[25]  J. Nadal,et al.  From statistical physics to statistical inference and back , 1994 .

[26]  P. Billingsley,et al.  Ergodic theory and information , 1966 .

[27]  P. Grassberger Finite sample corrections to entropy and dimension estimates , 1988 .

[28]  Jun Zhang,et al.  LONG RANGE CORRELATION IN HUMAN WRITINGS , 1993 .

[29]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[30]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[31]  Werner Ebeling,et al.  Entropy of symbolic sequences: the role of correlations , 1991 .

[32]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[33]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[34]  Werner Ebeling,et al.  Word frequency and entropy of symbolic sequences: a dynamical perspective , 1992 .

[35]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[36]  Christopher K. R. T. Jones,et al.  Global dynamical behavior of the optical field in a ring cavity , 1985 .

[37]  P. Shields Entropy and Prefixes , 1992 .

[38]  Edwin B. Newman,et al.  The Redundancy of Texts in Three Languages , 1960, Inf. Control..

[39]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[40]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[41]  Gottfried Mayer-Kress,et al.  Dimensions and Entropies in Chaotic Systems , 1986 .

[42]  P. Grassberger,et al.  On the symbolic dynamics of the Henon map , 1989 .

[43]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.