Extracting powers and periods in a word from its runs structure

A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a word of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for classical textual problems: computing all distinct k-th word powers for a given k, in particular squares for k=2, and finding all local periods in a given word of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a word and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov and Kucherov, 1999 [25,26]). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.

[1]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for a Special Case of Disjoint Set Union , 1985, J. Comput. Syst. Sci..

[2]  Wojciech Rytter,et al.  Efficient Algorithms for Two Extensions of LPF Table: The Power of Suffix Arrays , 2010, SOFSEM.

[3]  Gregory Kucherov,et al.  On Maximal Repetitions in Words , 1999, FCT.

[4]  Gonzalo Navarro,et al.  Position-Restricted Substring Searching , 2006, LATIN.

[5]  Aviezri S. Fraenkel,et al.  How Many Squares Can a String Contain? , 1998, J. Comb. Theory, Ser. A.

[6]  Maxime Crochemore,et al.  Improved algorithms for the range next value problem and applications , 2012, Theor. Comput. Sci..

[7]  Lucian Ilie,et al.  A note on the number of squares in a word , 2007, Theor. Comput. Sci..

[8]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[9]  Gang Chen,et al.  Fast and Practical Algorithms for Computing All the Runs in a String , 2007, CPM.

[10]  Timothy M. Chan,et al.  Orthogonal range searching on the RAM, revisited , 2011, SoCG '11.

[11]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[12]  Volker Heun,et al.  A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array , 2007, ESCAPE.

[13]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[14]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[15]  Jamie Simpson Modified Padovan words and the maximum number of runs in a word , 2010, Australas. J Comb..

[16]  Lucian Ilie,et al.  A simple proof that a word of length n has at most 2n distinct squares , 2005, J. Comb. Theory A.

[17]  Wojciech Rytter,et al.  Efficient Data Structures for the Factor Periodicity Problem , 2012, SPIRE.

[18]  Moshe Lewenstein,et al.  Generalized Substring Compression , 2009, CPM.

[19]  Juhani Karhumäki,et al.  Combinatorics on words: a tutorial , 2003, Bull. EATCS.

[20]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[21]  Wojciech Rytter,et al.  Repetitions in strings: Algorithms and combinatorics , 2009, Theor. Comput. Sci..

[22]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[23]  Wojciech Rytter,et al.  On the Maximal Number of Cubic Subwords in a String , 2009, IWOCA.

[24]  Lucian Ilie,et al.  The "runs" conjecture , 2011, Theor. Comput. Sci..

[25]  M. Lothaire,et al.  Applied Combinatorics on Words , 2005 .

[26]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[27]  Michael G. Main,et al.  Detecting leftmost maximal periodicities , 1989, Discret. Appl. Math..

[28]  Lucian Ilie,et al.  Practical Algorithms for the Longest Common Extension Problem , 2009, SPIRE.

[29]  M. Crochemore,et al.  Algorithms on Strings: Tools , 2007 .

[30]  Jens Stoye,et al.  Linear time algorithms for finding and representing all the tandem repeats in a string , 2004, J. Comput. Syst. Sci..

[31]  Arnaud Lefebvre,et al.  Linear-time computation of local periods , 2004, Theor. Comput. Sci..