Word frequency and entropy of symbolic sequences: a dynamical perspective

Abstract Symbolic sequences generated by nonlinear dynamics, a German text and a piece of classical music are investigated. The higher order block entropies and the mean uncertainty are calculated using both analytical and numerical methods. The existence of weak long memory effects and the corresponding scaling of the entropies are explored. The hypothesis is developed that for language-like processes the block entropies increase in a sublinear way with the word length n, i.e. Hn ∼ anμ with exponents in the range μ ∼ 1 4 − 1 2 . Correspondingly the effective number of words follows a stretched exponential law.