Word Complexity And Repetitions In Words

With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, R(w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu…u of the same subword u of w are stored as (u,n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, SUB, and Lempel-Ziv complexity, LZ. We have always R(w)≥LZ(w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word α being ultimately periodic is equivalent to: (i) , (ii) , and (iii) . De Bruijn words, well known for their high subword complexity, are shown to have almost highest repetition complexity; the precise complexity remains open. R(w) can be computed in time and it is open, and probably very difficult, to find fast algorithms.

[1]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[2]  M. Lothaire,et al.  Algebraic Combinatorics on Words: Index of Notation , 2002 .

[3]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  G. A. Hedlund,et al.  Unending chess, symbolic dynamics and a problem in semigroups , 1944 .

[6]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[7]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[8]  Gregory J. Chaitin,et al.  Information-Theoretic Limitations of Formal Systems , 1974, JACM.

[9]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[10]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[11]  James A. Storer,et al.  The macro model for data compression (Extended Abstract) , 1978, STOC '78.

[12]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[13]  M. Lothaire Combinatorics on words: Bibliography , 1997 .

[14]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[15]  Ethan M. Coven,et al.  Sequences with minimal block growth II , 1973, Mathematical systems theory.

[16]  de Ng Dick Bruijn A combinatorial problem , 1946 .

[17]  Wojciech Rytter,et al.  Squares, cubes, and time-space efficient string searching , 1995, Algorithmica.

[18]  Michael G. Main,et al.  Detecting leftmost maximal periodicities , 1989, Discret. Appl. Math..

[19]  Françoise Dejean,et al.  Sur un Théorème de Thue , 1972, J. Comb. Theory A.

[20]  Ethan M. Coven,et al.  Sequences with minimal block growth , 2005, Mathematical systems theory.

[21]  Christian Choffrut,et al.  Combinatorics of Words , 1997, Handbook of Formal Languages.

[22]  M. Lothaire Algebraic Combinatorics on Words , 2002 .

[23]  Tero Harju,et al.  Combinatorics on Words , 2004 .

[24]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[25]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[26]  Dominique Perrin,et al.  Compression and Entropy , 1992, STACS.