Factorizing a String into Squares in Linear Time

A square factorization of a string w is a factorization of w in which each factor is a square. Dumitran et al. [SPIRE 2015, pp. 54-66] showed how to find a square factorization of a given string of length n in O(n log n) time, and they posed a question whether it can be done in O(n) time. In this paper, we answer their question positively, showing an O(n)-time algorithm for square factorization in the standard word RAM model with machine word size omega = Omega(log n). We also show an O(n + (n log^2 n) / omega)-time (respectively, O(n log n)-time) algorithm to find a square factorization which contains the maximum (respectively, minimum) number of squares.

[1]  Florin Manea,et al.  On Prefix/Suffix-Square Free Words , 2015, SPIRE.

[2]  Shunsuke Inenaga,et al.  Diverse Palindromic Factorization is NP-Complete , 2018, Int. J. Found. Comput. Sci..

[3]  Torben Hagerup,et al.  Sorting and Searching on the Word RAM , 1998, STACS.

[4]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[5]  Jean Pierre Duval,et al.  Factorizing Words over an Ordered Alphabet , 1983, J. Algorithms.

[6]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[7]  Victor Mitrana,et al.  Prefix-suffix duplication , 2014, J. Comput. Syst. Sci..

[8]  Juha Kärkkäinen,et al.  A subquadratic algorithm for minimum palindromic factorization , 2014, J. Discrete Algorithms.

[9]  Manfred Kufleitner On Bijective Variants of the Burrows-Wheeler Transform , 2009, Stringology.

[10]  Wojciech Rytter,et al.  Squares, cubes, and time-space efficient string searching , 1995, Algorithmica.

[11]  Costas S. Iliopoulos,et al.  Closed Factorization , 2014, Stringology.

[12]  Shunsuke Inenaga,et al.  Computing palindromic factorizations and palindromic covers on-line , 2014 .

[13]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[14]  R. Lyndon On Burnside’s problem , 1954 .

[15]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[16]  Arseny M. Shur,et al.  Pal k is Linear Recognizable Online , 2015, SOFSEM.

[17]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[18]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[19]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[20]  Lucian Ilie,et al.  A Simple Algorithm for Computing the Lempel Ziv Factorization , 2008, Data Compression Conference (dcc 2008).

[21]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.