Experimental Evaluation of Algorithms for Computing Quasiperiods

Quasiperiodicity is a generalization of periodicity that was introduced in the early 1990s. Since then, dozens of algorithms for computing various types of quasiperiodicity were proposed. Our work is a step towards answering the question: "Which algorithm for computing quasiperiods to choose in practice?". The central notions of quasiperiodicity are covers and seeds. We implement algorithms for computing covers and seeds in the original and in new simplified versions and compare their efficiency on various types of data. We also discuss other known types of quasiperiodicity, distinguish partial covers as currently the most promising for large real-world data, and check their effectiveness using real-world data.

[1]  Costas S. Iliopoulos,et al.  Optimal Superprimitivity Testing for Strings , 1991, Inf. Process. Lett..

[2]  Costas S. Iliopoulos,et al.  Computing the lambda-covers of a string , 2007, Inf. Sci..

[3]  Maxime Crochemore,et al.  Cover Array String Reconstruction , 2010, CPM.

[4]  Costas S. Iliopoulos,et al.  Quasiperiodicity and String Covering , 1999, Theor. Comput. Sci..

[5]  Wojciech Rytter,et al.  Universal reconstruction of a string , 2020, Theor. Comput. Sci..

[6]  Solon P. Pissis,et al.  Indexing Weighted Sequences: Neat and Efficient , 2020, Inf. Comput..

[7]  Costas S. Iliopoulos,et al.  String Regularities with Don't Cares , 2003, Nord. J. Comput..

[8]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[9]  Costas S. Iliopoulos,et al.  Enhanced string covering , 2013, Theor. Comput. Sci..

[10]  William F. Smyth,et al.  A Correction to "An Optimal Algorithm to Compute all the Covers of a String" , 1995, Inf. Process. Lett..

[11]  Kunsoo Park,et al.  Finding Approximate Covers of Strings , 2002 .

[12]  Yin Li,et al.  Computing the Cover Array in Linear Time , 2001, Algorithmica.

[13]  Costas S. Iliopoulos,et al.  The subtree max gap problem with application to parallel string covering , 1994, SODA '94.

[14]  Costas S. Iliopoulos,et al.  Computing the λ-Seeds of a String , 2006 .

[15]  Wojciech Rytter,et al.  Jewels of stringology : text algorithms , 2002 .

[16]  Ely Porat,et al.  Approximate cover of strings , 2019, Theor. Comput. Sci..

[17]  Moshe Lewenstein,et al.  Can We Recover the Cover? , 2019, Algorithmica.

[18]  Maxime Crochemore,et al.  On left and right seeds of a string , 2012, J. Discrete Algorithms.

[19]  Dany Breslauer,et al.  Testing String Superprimitivity in Parallel , 1994, Inf. Process. Lett..

[20]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[21]  Solon P. Pissis,et al.  Efficient Index for Weighted Sequences , 2016, CPM.

[22]  Gad M. Landau,et al.  Conservative String Covering of Indeterminate Strings , 2008, Stringology.

[23]  Maxime Crochemore,et al.  Quasiperiodicities in Fibonacci strings , 2012, Ars Comb..

[24]  William F. Smyth,et al.  Repetitive perhaps, but certainly not boring , 2000, Theor. Comput. Sci..

[25]  Costas S. Iliopoulos,et al.  Covering a string , 2005, Algorithmica.

[26]  Borivoj Melichar,et al.  Searching all approximate covers and their distance using finite automata , 2008, ITAT.

[27]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[28]  Bořivoj Melichar,et al.  Using Finite Automata Approach for Searching Approximate Seeds of Strings , 2009 .

[29]  Dany Breslauer,et al.  An On-Line String Superprimitivity Test , 1992, Inf. Process. Lett..

[30]  Maxime Crochemore,et al.  Two-Dimensional Prefix String Matching and Covering on Square Matrices , 1998, Algorithmica.

[31]  Anna Pagh,et al.  Solving the String Statistics Problem in Time O(n log n) , 2002, ICALP.

[32]  Alberto Apostolico,et al.  Of Periods, Quasiperiods, Repetitions and Covers , 1997, Structures in Logic and Computer Science.

[33]  Franco P. Preparata,et al.  Data structures and algorithms for the string statistics problem , 1996, Algorithmica.

[34]  Adam Karczmarz,et al.  A Simple Mergeable Dictionary , 2016, SWAT.

[35]  Costas S. Iliopoulos,et al.  Two strings at Hamming distance 1 cannot be both quasiperiodic , 2017, Inf. Process. Lett..

[36]  Costas S. Iliopoulos,et al.  New complexity results for the k-covers problem , 2011, Inf. Sci..

[37]  Ondřej Guth On approximate enhanced covers under Hamming distance , 2020, Discret. Appl. Math..

[38]  Arseny M. Shur,et al.  Counting Palindromes in Substrings , 2017, SPIRE.

[39]  Costas S. Iliopoulos,et al.  The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications , 2006, Fundam. Informaticae.

[40]  Christian N. S. Pedersen,et al.  Solving the String Statistics Problem in Time O(n log n) , 2002 .

[41]  Wojciech Rytter,et al.  A Linear-Time Algorithm for Seeds Computation , 2011, SODA.

[42]  Wojciech Rytter,et al.  The Maximum Number of Squares in a Tree , 2012, CPM.

[43]  Richard Cole,et al.  The Complexity of the Minimum k-Cover Problem , 2005, J. Autom. Lang. Comb..

[44]  Wojciech Rytter,et al.  A Linear-Time Algorithm for Seeds Computation , 2020, ACM Trans. Algorithms.

[45]  Alexandru Popa,et al.  An output-sensitive algorithm for the minimization of 2-dimensional String Covers , 2018, TAMC.

[46]  Wojciech Rytter,et al.  Fast Algorithm for Partial Covers in Words , 2014, Algorithmica.

[47]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[48]  Jeong Seop Sim,et al.  Approximate Seeds of Strings , 2003, Stringology.

[49]  Andrzej Ehrenfeucht,et al.  Efficient Detection of Quasiperiodicities in Strings , 1993, Theor. Comput. Sci..

[50]  Jakub Radoszewski,et al.  Quasi-Periodicity in Streams , 2019, CPM.

[51]  William F. Smyth,et al.  An Optimal Algorithm to Compute all the Covers of a String , 1994, Inf. Process. Lett..

[52]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[53]  Costas S. Iliopoulos,et al.  Quasiperiodicity: From Detection to Normal Forms , 1998, J. Autom. Lang. Comb..

[54]  Ely Porat,et al.  Quasi-Periodicity Under Mismatch Errors , 2018, CPM.

[55]  Mohammad Sohel Rahman,et al.  Computing covers using prefix tables , 2014, Discret. Appl. Math..

[56]  Wojciech Rytter,et al.  Efficient seed computation revisited , 2013, Theor. Comput. Sci..

[57]  Wojciech Rytter,et al.  Efficient algorithms for shortest partial seeds in words , 2018, Theor. Comput. Sci..

[58]  Wojciech Rytter,et al.  Covering problems for partial words and for indeterminate strings , 2017, Theor. Comput. Sci..

[59]  Christian N. S. Pedersen,et al.  Finding Maximal Quasiperiodicities in Strings , 1999, CPM.