Simple and flexible detection of contiguous repeats using a suffix tree

We study the problem of detecting all occurrences of (primitive) tandem repeats and tandem arrays in a string. We first give a simple time- and space- optimal algorithm to find all tandem repeats, and then modify it to become a time and space-optimal algorithm for finding only the primitive tandem repeats. Both of these algorithms are then extended to handle tandem arrays. The contribution of this paper is both pedagogical and practical, giving simple algorithms and implementations based on a suffix tree, using only standard tree traversal techniques.

[1]  Wojciech Rytter,et al.  Periodic Prefixes in Texts , 1993 .

[2]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[3]  Richard J. Lorentz,et al.  Linear Time Recognition of Squarefree Strings , 1985 .

[4]  Paul Frederick Stelling Application of combinatorial analysis to repetitions in strings, phylogeny, and parallel multiplier design , 1996 .

[5]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[6]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[8]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[9]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[10]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[11]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[12]  Z. Galil,et al.  Combinatorial Algorithms on Words , 1985 .

[13]  Alberto Apostolico,et al.  The Myriad Virtues of Subword Trees , 1985 .

[14]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[15]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 1993, CPM.

[16]  S. Rao Kosaraju,et al.  Computation of Squares in a String (Preliminary Version) , 1994, CPM.