A Linear-Time Algorithm for Seeds Computation

A seed in a word is a relaxed version of a period. We show a linear time algorithm computing a compact representation of all the seeds of a word, in particular, the shortest seed. Thus, we solve an open problem stated in the survey by Smyth (2000) and improve upon a previous over 15-year old O(n log n) algorithm by Iliopoulos, Moore and Park (1996). Our approach is based on combinatorial relations between seeds and a variant of the LZ-factorization (used here for the first time in context of seeds).

[1]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[2]  Wojciech Rytter,et al.  Efficient algorithms for three variants of the LPF table , 2012, J. Discrete Algorithms.

[3]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[4]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.

[5]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[6]  Dany Breslauer,et al.  An On-Line String Superprimitivity Test , 1992, Inf. Process. Lett..

[7]  S. Muthukrishnan,et al.  Perfect Hashing for Strings: Formalization and Algorithms , 1996, CPM.

[8]  Yin Li,et al.  Computing the Cover Array in Linear Time , 2001, Algorithmica.

[9]  Costas S. Iliopoulos,et al.  Covering a String , 1993, CPM.

[10]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[11]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[12]  H. Wilf,et al.  Uniqueness theorems for periodic functions , 1965 .

[13]  Maxime Crochemore,et al.  Transducers and Repetitions , 1986, Theor. Comput. Sci..

[14]  S. Muthukrishnan,et al.  On the sorting-complexity of suffix tree construction , 2000, JACM.

[15]  J. Allouche Algebraic Combinatorics on Words , 2005 .

[16]  M. Lothaire,et al.  Applied Combinatorics on Words , 2005 .

[17]  Costas S. Iliopoulos,et al.  Computing the lambda-Seeds of a String , 2006, AAIM.

[18]  Kunsoo Park,et al.  Finding Approximate Covers of Strings , 2002 .

[19]  J. Davenport Editor , 1960 .

[20]  Robert E. Tarjan,et al.  A linear-time algorithm for a special case of disjoint set union , 1983, J. Comput. Syst. Sci..

[21]  William F. Smyth,et al.  Computing the covers of a string in linear time , 1994, SODA '94.

[22]  Costas S. Iliopoulos,et al.  Computing the Minimum Approximate lambda-Cover of a String , 2006, SPIRE.

[23]  Maxime Crochemore,et al.  Computing Longest Previous non-overlapping Factors , 2011, Inf. Process. Lett..

[24]  Richard Cole,et al.  The Complexity of the Minimum k-Cover Problem , 2005, J. Autom. Lang. Comb..

[25]  Gad M. Landau,et al.  Dynamic text and static pattern matching , 2007, TALG.

[26]  Costas S. Iliopoulos,et al.  Computing the lambda-covers of a string , 2007, Inf. Sci..

[27]  Costas S. Iliopoulos,et al.  Covering a string , 2005, Algorithmica.

[28]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[29]  Jeong Seop Sim,et al.  Approximate Seeds of Strings , 2003, Stringology.

[30]  William F. Smyth,et al.  An Optimal Algorithm to Compute all the Covers of a String , 1994, Inf. Process. Lett..

[31]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[32]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[33]  Andrzej Ehrenfeucht,et al.  Efficient Detection of Quasiperiodicities in Strings , 1993, Theor. Comput. Sci..

[34]  Costas S. Iliopoulos,et al.  New complexity results for the k-covers problem , 2011, Inf. Sci..

[35]  Christian N. S. Pedersen,et al.  Finding Maximal Quasiperiodicities in Strings , 1999, CPM.

[36]  William F. Smyth,et al.  Repetitive perhaps, but certainly not boring , 2000, Theor. Comput. Sci..

[37]  M. Crochemore,et al.  Algorithms on Strings: Tools , 2007 .

[38]  Costas S. Iliopoulos,et al.  The subtree max gap problem with application to parallel string covering , 1994, SODA '94.

[39]  Wojciech Rytter,et al.  Efficient Seeds Computation Revisited , 2011, CPM.

[40]  Costas S. Iliopoulos,et al.  Optimal Superprimitivity Testing for Strings , 1991, Inf. Process. Lett..

[41]  William F. Smyth,et al.  A Correction to "An Optimal Algorithm to Compute all the Covers of a String" , 1995, Inf. Process. Lett..

[42]  Costas S. Iliopoulos,et al.  Quasiperiodicity: From Detection to Normal Forms , 1998, J. Autom. Lang. Comb..

[43]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[44]  Michael G. Main,et al.  Detecting leftmost maximal periodicities , 1989, Discret. Appl. Math..