On maximal repeats in strings

Abstract We clarify in this note the relationship between the maximal repeats in a string p and the compact suffix automaton CSA (p) built on p. It appears that the maximal repeats are the longest strings reaching each internal state of the CSA (p) . This result permits to derive the maximal and the average number of maximal repeats (under a model of independence and equiprobability of the characters of p) from earlier studies on the size of CSA (p) . It also permits to get a simpler enumeration algorithm of all the maximal repeats in p, using less memory space.

[1]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[2]  Maxime Crochemore,et al.  Direct Construction of Compact Directed Acyclic Word Graphs , 1997, CPM.

[3]  Jens Stoye,et al.  Finding Maximal Pairs with Bounded Gap , 1999 .

[4]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  Gregory Kucherov,et al.  Finding repeats with fixed gap , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[6]  David Haussler,et al.  Complete inverted files for efficient text retrieval and analysis , 1987, JACM.

[7]  David Haussler,et al.  Average sizes of suffix trees and DAWGs , 1989, Discret. Appl. Math..

[8]  Maxime Crochemore,et al.  On Compact Directed Acyclic Word Graphs , 1997, Structures in Logic and Computer Science.