论文信息 - Word Complexity And Repetitions In Words

Word Complexity And Repetitions In Words

With ideas from data compression and combinatorics on words, we introduce a complexity measure for words, called repetition complexity, which quantifies the amount of repetition in a word. The repetition complexity of w, R(w), is defined as the smallest amount of space needed to store w when reduced by repeatedly applying the following procedure: n consecutive occurrences uu…u of the same subword u of w are stored as (u,n). The repetition complexity has interesting relations with well-known complexity measures, such as subword complexity, SUB, and Lempel-Ziv complexity, LZ. We have always R(w)≥LZ(w) and could even be that the former is linear while the latter is only logarithmic; e.g., this happens for prefixes of certain infinite words obtained by iterated morphisms. An infinite word α being ultimately periodic is equivalent to: (i) , (ii) , and (iii) . De Bruijn words, well known for their high subword complexity, are shown to have almost highest repetition complexity; the precise complexity remains open. R(w) can be computed in time and it is open, and probably very difficult, to find fast algorithms.

[1] Franco P. Preparata,et al. Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[2] M. Lothaire,et al. Algebraic Combinatorics on Words: Index of Notation , 2002 .

[3] Gregory Kucherov,et al. Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[4] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5] G. A. Hedlund,et al. Unending chess, symbolic dynamics and a problem in semigroups , 1944 .

[6] Maxime Crochemore,et al. An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[7] Per Martin-Löf,et al. The Definition of Random Sequences , 1966, Inf. Control..

[8] Gregory J. Chaitin,et al. Information-Theoretic Limitations of Formal Systems , 1974, JACM.

[9] Abraham Lempel,et al. Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[10] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .

[11] James A. Storer,et al. The macro model for data compression (Extended Abstract) , 1978, STOC '78.

[12] Wojciech Rytter,et al. Text Algorithms , 1994 .

[13] M. Lothaire. Combinatorics on words: Bibliography , 1997 .

[14] Michael G. Main,et al. An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[15] Ethan M. Coven,et al. Sequences with minimal block growth II , 1973, Mathematical systems theory.

[16] de Ng Dick Bruijn. A combinatorial problem , 1946 .

[17] Wojciech Rytter,et al. Squares, cubes, and time-space efficient string searching , 1995, Algorithmica.

[18] Michael G. Main,et al. Detecting leftmost maximal periodicities , 1989, Discret. Appl. Math..

[19] Françoise Dejean,et al. Sur un Théorème de Thue , 1972, J. Comb. Theory A.

[20] Ethan M. Coven,et al. Sequences with minimal block growth , 2005, Mathematical systems theory.

[21] Christian Choffrut,et al. Combinatorics of Words , 1997, Handbook of Formal Languages.

[22] M. Lothaire. Algebraic Combinatorics on Words , 2002 .

[23] Tero Harju,et al. Combinatorics on Words , 2004 .

[24] Abraham Lempel,et al. On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[25] Abraham Lempel,et al. A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[26] Dominique Perrin,et al. Compression and Entropy , 1992, STACS.