Efficient Enumeration of Distinct Factors Using Package Representations

We investigate properties and applications of a new compact representation of string factors: families of packages. In a string T , each package (i, `, k) represents the factors of T of length ` that start in the interval [i, i + k]. A family F of packages represents the set Factors(F) defined as the union of the sets of factors represented by individual packages in F . We show how to efficiently enumerate Factors(F) and showcase that this is a generic tool for enumerating important classes of factors of T , such as powers and antipowers. Our approach is conceptually simpler than problem-specific methods and provides a unifying framework for such problems, which we hope can be further exploited. We also consider a special case of the problem in which every occurrence of every factor represented by F is captured by some package in F . For both applications mentioned above, we construct an efficient package representation that satisfies this property. We develop efficient algorithms that, given a family F of m packages in a string of length n, report all distinct factors represented by these packages in O(n log n+m logn+ |Factors(F)|) time for the general case and in the optimal O(n + m + |Factors(F)|) time for the special case. We can also compute |Factors(F)| in O(n log n + m logn) time in the general case and in O(n+m) time in the special case. In particular, we improve over the state-of-the-art O(nk log k logn)-time algorithm for computing the number of distinct k-antipower factors, by providing an algorithm that runs in O(nk) time, and we obtain an alternative linear-time algorithm to enumerate distinct squares. ? Partially supported by ERC grant TOTAL under the EU’s Horizon 2020 Research and Innovation Programme (agreement no. 677651). ?? Supported by ISF grants no. 1278/16 and 1926/19, by a BSF grant no. 2018364, and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064). ? ? ? Supported by the Polish National Science Center, grant no. 2018/31/D/ST6/03991.

[1]  H. Wilf,et al.  Uniqueness theorems for periodic functions , 1965 .

[2]  Costas S. Iliopoulos,et al.  Property Suffix Array with Applications in Indexing Weighted Sequences , 2020, ACM J. Exp. Algorithmics.

[3]  Tsvi Kopelowitz,et al.  Property matching and weighted matching , 2006, Theor. Comput. Sci..

[4]  R. Kolpakov Some results on the number of periodic factors in words , 2020, Inf. Comput..

[5]  Antonio Restivo,et al.  Anti-Powers in Infinite Words , 2016, ICALP.

[6]  Wing-Kai Hon,et al.  Compressed Property Suffix Trees , 2011, 2011 Data Compression Conference.

[7]  Tomasz Kociumaka,et al.  String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure , 2019, STOC.

[8]  Solon P. Pissis,et al.  Indexing Weighted Sequences: Neat and Efficient , 2020, Inf. Comput..

[9]  Wojciech Rytter,et al.  Internal Pattern Matching Queries in a Text and Applications , 2013, SODA.

[10]  Wojciech Rytter,et al.  String Powers in Trees , 2016, Algorithmica.

[11]  Kazuya Tsuruta,et al.  The "Runs" Theorem , 2014, SIAM J. Comput..

[12]  Jens Stoye,et al.  Linear time algorithms for finding and representing all the tandem repeats in a string , 2004, J. Comput. Syst. Sci..

[13]  Martin Farach-Colton,et al.  Optimal Suffix Tree Construction with Large Alphabets , 1997, FOCS.

[14]  Simon J. Puglisi,et al.  Algorithms for Anti-Powers in Strings , 2018, Inf. Process. Lett..

[15]  Wojciech Rytter,et al.  Efficient Representation and Counting of Antipower Factors in Words , 2018, LATA.

[16]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Shunsuke Inenaga,et al.  Tighter Bounds and Optimal Algorithms for All Maximal α-gapped Repeats and Palindromes , 2017, Theory of Computing Systems.

[18]  Frantisek Franek,et al.  How many double squares can a string contain? , 2015, Discret. Appl. Math..

[19]  Tomasz Kociumaka Efficient data structures for internal queries in texts , 2019 .

[20]  Costas S. Iliopoulos,et al.  Online Algorithms on Antipowers and Antiperiods , 2019, SPIRE.

[21]  J. Allouche Algebraic Combinatorics on Words , 2005 .

[22]  Wojciech Rytter,et al.  Extracting powers and periods in a word from its runs structure , 2014, Theor. Comput. Sci..

[23]  Hideo Bannai,et al.  Computing All Distinct Squares in Linear Time for Integer Alphabets , 2017, CPM.

[24]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[25]  Wojciech Rytter,et al.  A Linear-Time Algorithm for Seeds Computation , 2011, SODA.

[26]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[27]  Solon P. Pissis,et al.  A ug 2 01 7 Indexing Weighted Sequences : Neat and Efficient , 2017 .

[28]  Aviezri S. Fraenkel,et al.  How Many Squares Can a String Contain? , 1998, J. Comb. Theory, Ser. A.

[29]  Costas S. Iliopoulos,et al.  Computing the Antiperiod(s) of a String , 2019, CPM.

[30]  Robert E. Tarjan,et al.  A linear-time algorithm for a special case of disjoint set union , 1983, J. Comput. Syst. Sci..