Fast Algorithm for Partial Covers in Words

A factor $$u$$u of a word $$w$$w is a cover of $$w$$w if every position in $$w$$w lies within some occurrence of $$u$$u in $$w$$w. A word $$w$$w covered by $$u$$u thus generalizes the idea of a repetition, that is, a word composed of exact concatenations of $$u$$u. In this article we introduce a new notion of $$\alpha $$α-partial cover, which can be viewed as a relaxed variant of cover, that is, a factor covering at least $$\alpha $$α positions in $$w$$w. We develop a data structure of $$\mathcal {O}(n)$$O(n) size (where $$n=|w|$$n=|w|) that can be constructed in $$\mathcal {O}(n\log n)$$O(nlogn) time which we apply to compute all shortest $$\alpha $$α-partial covers for a given $$\alpha $$α. We also employ it for an $$\mathcal {O}(n\log n)$$O(nlogn)-time algorithm computing a shortest $$\alpha $$α-partial cover for each $$\alpha =1,2,\ldots ,n$$α=1,2,…,n.

[1]  Franco P. Preparata,et al.  Data structures and algorithms for the string statistics problem , 1996, Algorithmica.

[2]  Anna Pagh,et al.  Solving the String Statistics Problem in Time O(n log n) , 2002, ICALP.

[3]  William F. Smyth,et al.  An Optimal Algorithm to Compute all the Covers of a String , 1994, Inf. Process. Lett..

[4]  Jens Stoye,et al.  Linear time algorithms for finding and representing all the tandem repeats in a string , 2004, J. Comput. Syst. Sci..

[5]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[6]  Martin Farach-Colton,et al.  Optimal Suffix Tree Construction with Large Alphabets , 1997, FOCS.

[7]  Aviezri S. Fraenkel,et al.  How Many Squares Can a String Contain? , 1998, J. Comb. Theory, Ser. A.

[8]  Wojciech Rytter,et al.  A Linear-Time Algorithm for Seeds Computation , 2011, SODA.

[9]  M. Crochemore,et al.  Algorithms on Strings: Tools , 2007 .

[10]  Costas S. Iliopoulos,et al.  Enhanced string covering , 2013, Theor. Comput. Sci..

[11]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[12]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[13]  Costas S. Iliopoulos,et al.  Covering a String , 1993, CPM.

[14]  Wojciech Rytter,et al.  Extracting powers and periods in a word from its runs structure , 2014, Theor. Comput. Sci..

[15]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[16]  Costas S. Iliopoulos,et al.  New and Efficient Approaches to the Quasiperiodic Characterisation of a String , 2012, Stringology.

[17]  Robert E. Tarjan,et al.  A Fast Merging Algorithm , 1979, JACM.

[18]  John Hershberger,et al.  Finding the Upper Envelope of n Line Segments in O(n log n) Time , 1989, Inf. Process. Lett..

[19]  Kunsoo Park,et al.  Finding Approximate Covers of Strings , 2002 .

[20]  Gregory Kucherov,et al.  Cross-Document Pattern Matching , 2012, CPM.

[21]  Costas S. Iliopoulos,et al.  Optimal Superprimitivity Testing for Strings , 1991, Inf. Process. Lett..

[22]  Wojciech Rytter,et al.  Jewels of stringology : text algorithms , 2002 .

[23]  Andrzej Ehrenfeucht,et al.  Efficient Detection of Quasiperiodicities in Strings , 1993, Theor. Comput. Sci..

[24]  Wojciech Rytter,et al.  Repetitions in strings: Algorithms and combinatorics , 2009, Theor. Comput. Sci..

[25]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[26]  Dany Breslauer,et al.  An On-Line String Superprimitivity Test , 1992, Inf. Process. Lett..

[27]  Yin Li,et al.  Computing the Cover Array in Linear Time , 2001, Algorithmica.

[28]  Christian N. S. Pedersen,et al.  Finding Maximal Quasiperiodicities in Strings , 1999, CPM.