Combinatorics of periods in strings

We consider the set Γn of all period sets of strings of length n over a finite alphabet. We show that there is redundancy in period sets and introduce the notion of an irreducible period set. We prove that Γn is a lattice under set inclusion and does not satisfy the Jordan-Dedekind condition. We propose the first efficient enumeration algorithm for Γn and improve upon the previously known asymptotic lower bounds on the cardinality of Γn. Finally, we provide a new recurrence to compute the number of strings sharing a given period set, and exhibit an algorithm to sample uniformly period sets through irreducible period set.

[1]  Leonidas J. Guibas,et al.  Periods in Strings , 1981, J. Comb. Theory, Ser. A.

[2]  Leonidas J. Guibas,et al.  String Overlaps, Pattern Matching, and Nontransitive Games , 1981, J. Comb. Theory A.

[3]  James Allen Fill,et al.  An interruptible algorithm for perfect sampling via Markov chains , 1997, STOC '97.

[4]  D. R. McGregor,et al.  Fast approximate string matching , 1988, Softw. Pract. Exp..

[5]  P. Pevzner,et al.  Computational Molecular Biology , 2000 .

[6]  P. Billingsley,et al.  Probability and Measure , 1980 .

[7]  Sven Rahmann,et al.  Combinatorics of Periods in Strings , 2001, ICALP.

[8]  Ora E. Percus,et al.  Theory and application of Marsaglia's monkey test for pseudorandom number generators , 1995, TOMC.

[9]  G. Marsaglia,et al.  Monkey tests for random number generators , 1993 .

[10]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[11]  Sven Rahmann,et al.  Exact and Efficient Computation of the Expected Number of Missing and Common Words in Random Texts , 2000, CPM.

[12]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[13]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[14]  Carl-Erik Fröberg,et al.  Accurate estimation of the number of binary partitions , 1977 .

[15]  Petra Mutzel,et al.  Computational Molecular Biology , 1996 .

[16]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[17]  Philippe Flajolet,et al.  Analysis of algorithms , 2000, Random Struct. Algorithms.

[18]  Esko Ukkonen,et al.  A Comparison of Approximate String Matching Algorithms , 1996, Softw. Pract. Exp..

[19]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[20]  Martin Vingron,et al.  q-gram based database searching using a suffix array (QUASAR) , 1999, RECOMB.

[21]  N. J. A. Sloane,et al.  The On-Line Encyclopedia of Integer Sequences , 2003, Electron. J. Comb..

[22]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[23]  Esko Ukkonen,et al.  A Comparison of Approximate String Matching Algorithms , 1996 .

[24]  de Ng Dick Bruijn On Mahler's partition problem , 1948 .

[25]  J. A. Fill An interruptible algorithm for perfect sampling via Markov chains , 1998 .

[26]  M. Lothaire Algebraic Combinatorics on Words , 2002 .

[27]  H. Wilf,et al.  Uniqueness theorems for periodic functions , 1965 .

[28]  Christian Choffrut,et al.  Combinatorics of Words , 1997, Handbook of Formal Languages.

[29]  Sven Rahmann,et al.  On the Distribution of the Number of Missing Words in Random Texts , 2003, Combinatorics, Probability and Computing.