Entropy analysis of substitutive sequences revisited

A given finite sequence of letters over a finite alphabet can always be algorithmically generated, in particular by a Turing machine. This fact is at the heart of complexity theory in the sense of Kolmogorov and Chaitin. A relevant question in this context is whether, given a statistically ‘sufficiently long’ sequence, there exists a deterministic finite automaton that generates it. In this paper we propose a simple criterion, based on measuring block entropies by lumping, which is satisfied by all automatic sequences. On the basis of this, one can determine that a given sequence is not automatic and obtain interesting information when the sequence is automatic. Following previous work on the Feigenbaum sequence, we give a necessary entropy-based condition valid for all automatic sequences read by lumping. Applications of these ideas to representative examples are discussed. In particular, we establish new entropic decimation schemes for the Thue–Morse, the Rudin–Shapiro and the paperfolding sequences read by lumping.

[1]  Jeffrey Shallit,et al.  Automaticity IV: sequences, sets, and diversity , 1996 .

[2]  Bernard Derrida,et al.  Iteration of endomorphisms on the real axis and representation of numbers , 1978 .

[3]  Valérie Berthé,et al.  Conditional entropy of some automatic sequences , 1994 .

[4]  Jeffrey Shallit,et al.  Automaticity II: Descriptional Complexity in the Unary Case , 1997, Theor. Comput. Sci..

[5]  R. Mantegna,et al.  Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[6]  Werner Ebeling,et al.  Word frequency and entropy of symbolic sequences: a dynamical perspective , 1992 .

[7]  Konstantinos Karamanos,et al.  Symbolic Dynamics and Entropy Analysis of Feigenbaum Limit Sets , 1999 .

[8]  Jeffrey Shallit,et al.  The Ubiquitous Prouhet-Thue-Morse Sequence , 1998, SETA.

[9]  Julien Cassaigne,et al.  Complexité et facteurs spéciaux , 1997 .

[10]  Nicholas C. Metropolis,et al.  On Finite Limit Sets for Transformations on the Unit Interval , 1973, J. Comb. Theory A.

[11]  Aleksandr Yakovlevich Khinchin,et al.  Mathematical foundations of information theory , 1959 .

[12]  S. Brenner,et al.  General Nature of the Genetic Code for Proteins , 1961, Nature.

[13]  Pierre Gaspard,et al.  Toward a probabilistic approach to complex systems , 1994 .

[14]  Jeffrey Shallit,et al.  Automaticity I: Properties of a Measure of Descriptional Complexity , 1996, J. Comput. Syst. Sci..

[15]  F. M. Dekking,et al.  The spectrum of dynamical systems arising from substitutions of constant length , 1978 .

[16]  Werner Ebeling,et al.  Entropy of symbolic sequences: the role of correlations , 1991 .