Computing regularities in strings: A survey

The aim of this survey is to provide insight into the sequential algorithms that have been proposed to compute exact "regularities" in strings; that is, covers (or quasiperiods), seeds, repetitions, runs (or maximal periodicities), and repeats. After outlining and evaluating the algorithms that have been proposed for their computation, I suggest possibly productive future directions of research.

[1]  William F. Smyth,et al.  Weak repetitions in strings , 1997 .

[2]  William F. Smyth,et al.  How many runs can a string contain? , 2008, Theor. Comput. Sci..

[3]  Costas S. Iliopoulos,et al.  Optimal Superprimitivity Testing for Strings , 1991, Inf. Process. Lett..

[4]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[5]  Giovanni Manzini Two space saving tricks for linear time LCP computation , 2004 .

[6]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[8]  William F. Smyth,et al.  A Correction to "An Optimal Algorithm to Compute all the Covers of a String" , 1995, Inf. Process. Lett..

[9]  Simon J. Puglisi,et al.  The expected number of runs in a word , 2008, Australas. J Comb..

[10]  Wojciech Rytter,et al.  LPF Computation Revisited , 2009, IWOCA.

[11]  William F. Smyth,et al.  A New Periodicity Lemma , 2005, SIAM J. Discret. Math..

[12]  Wojciech Rytter,et al.  Efficient Seeds Computation Revisited , 2011, CPM.

[13]  William F. Smyth,et al.  Computing Patterns in Strings , 2003 .

[14]  Pang Ko,et al.  Linear Time Construction of Suffix Arrays , 2002 .

[15]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[16]  Jens Stoye,et al.  Finding Maximal Pairs with Bounded Gap , 1999 .

[17]  Frantisek Franek,et al.  An Improved Version of the Runs Algorithm Based on Crochemore's Partitioning Algorithm , 2011, Stringology.

[18]  Amar Mukherjee,et al.  The Burrows-Wheeler Transform:: Data Compression, Suffix Arrays, and Pattern Matching , 2008 .

[19]  Mohammad Sohel Rahman,et al.  Cache Oblivious Algorithms for the RMQ and the RMSQ Problems , 2010, Math. Comput. Sci..

[20]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[21]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[22]  William F. Smyth,et al.  Repetitive perhaps, but certainly not boring , 2000, Theor. Comput. Sci..

[23]  Giovanni Manzini,et al.  Two Space Saving Tricks for Linear Time LCP Array Computation , 2004, SWAT.

[24]  Costas S. Iliopoulos,et al.  Locating Maximal Multirepeats in Multiple Strings Under Various Constraints , 2007, Comput. J..

[25]  Maxime Crochemore,et al.  Application of suffix trees for the acquisition of common motifs with gaps in a set of strings , 2007, LATA.

[26]  Costas S. Iliopoulos,et al.  Faster Algorithms for Computing Maximal Multirepeats in Multiple Sequences , 2009, Fundam. Informaticae.

[27]  William F. Smyth,et al.  The three squares lemma revisited , 2012, J. Discrete Algorithms.

[28]  Costas S. Iliopoulos,et al.  Quasiperiodicity: From Detection to Normal Forms , 1998, J. Autom. Lang. Comb..

[29]  Costas S. Iliopoulos,et al.  Computing the λ-Seeds of a String , 2006 .

[30]  Alberto Apostolico,et al.  Of Periods, Quasiperiods, Repetitions and Covers , 1997, Structures in Logic and Computer Science.

[31]  Kunihiko Sadakane,et al.  An Online Algorithm for Finding the Longest Previous Factors , 2008, ESA.

[32]  Nicholas Paul Sheppard,et al.  On reductions for the Steiner Problem in Graphs , 2003, J. Discrete Algorithms.

[33]  Lucian Ilie,et al.  Minimum Unique Substrings and Maximum Repeats , 2011, Fundam. Informaticae.

[34]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[35]  William F. Smyth,et al.  Computing the covers of a string in linear time , 1994, SODA '94.

[36]  William F. Smyth,et al.  Fast, Practical Algorithms for Computing All the Repeats in a String , 2010, Math. Comput. Sci..

[37]  Francine Blanchet-Sadri Algorithmic Combinatorics on Partial Words , 2012, Int. J. Found. Comput. Sci..

[38]  Costas S. Iliopoulos,et al.  A New Approach to Pattern Matching in Degenerate DNA/RNA Sequences and Distributed Pattern Matching , 2008, Math. Comput. Sci..

[39]  Christian N. S. Pedersen,et al.  Finding Maximal Quasiperiodicities in Strings , 1999, CPM.

[40]  Lucian Ilie,et al.  A comparison of index-based lempel-Ziv LZ77 factorization algorithms , 2012, CSUR.

[41]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[42]  William F. Smyth,et al.  An Optimal Algorithm to Compute all the Covers of a String , 1994, Inf. Process. Lett..

[43]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[44]  Lucian Ilie,et al.  Towards a Solution to the "Runs" Conjecture , 2008, CPM.

[45]  Costas S. Iliopoulos,et al.  Covering a string , 2005, Algorithmica.

[46]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[47]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[48]  Tero Harju,et al.  Combinatorics on Words , 2004 .

[49]  Richard Cole,et al.  The Complexity of the Minimum k-Cover Problem , 2005, J. Autom. Lang. Comb..

[50]  Frantisek Franek,et al.  A Note on Crochemore's Repetitions Algorithm - A Fast Space-Efficient Approach , 2003, Nord. J. Comput..

[51]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2003, J. Discrete Algorithms.

[52]  James A. M. McHugh,et al.  A first approach to finding common motifs with gaps , 2005, Int. J. Found. Comput. Sci..

[53]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[54]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[55]  Juha Kärkkäinen,et al.  Permuted Longest-Common-Prefix Array , 2009, CPM.

[56]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[57]  Wojciech Rytter,et al.  The Number of Runs in a String: Improved Analysis of the Linear Upper Bound , 2006, STACS.

[58]  Lucian Ilie,et al.  Maximal repetitions in strings , 2008, J. Comput. Syst. Sci..

[59]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999, Softw. Pract. Exp..

[60]  Dany Breslauer,et al.  An On-Line String Superprimitivity Test , 1992, Inf. Process. Lett..

[61]  Yin Li,et al.  Computing the Cover Array in Linear Time , 2001, Algorithmica.

[62]  Arnaud Lefebvre,et al.  Computing Abelian Periods in Words , 2011, Stringology.

[63]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[64]  Francine Blanchet-Sadri Algorithmic Combinatorics on Partial Words (Discrete Mathematics and Its Applications) , 2007 .

[65]  Gregory Kucherov,et al.  Finding repeats with fixed gap , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[66]  Costas S. Iliopoulos,et al.  Covering a String , 1993, CPM.

[67]  Sen Zhang,et al.  Linear Time Suffix Array Construction Using D-Critical Substrings , 2009, CPM.

[68]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[69]  Andrzej Ehrenfeucht,et al.  Efficient Detection of Quasiperiodicities in Strings , 1993, Theor. Comput. Sci..

[70]  H. Wilf,et al.  Uniqueness theorems for periodic functions , 1965 .

[71]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[72]  Costas S. Iliopoulos,et al.  On-line algorithms for k-Covering , 1998 .

[73]  Frantisek Franek,et al.  Computing Quasi Suffix Arrays , 2003, J. Autom. Lang. Comb..

[74]  Gang Chen,et al.  Fast and Practical Algorithms for Computing All the Runs in a String , 2007, CPM.

[75]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[76]  Jamie Simpson Intersecting periodic words , 2007, Theor. Comput. Sci..

[77]  John E. Hopcroft,et al.  An n log n algorithm for minimizing states in a finite automaton , 1971 .

[78]  Simon J. Puglisi,et al.  Space-Time Tradeoffs for Longest-Common-Prefix Array Computation , 2008, ISAAC.

[79]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[80]  Gregory Kucherov,et al.  On Maximal Repetitions in Words , 1999, FCT.

[81]  Shu Wang,et al.  A new approach to the periodicity lemma on strings with holes , 2009, Theoretical Computer Science.

[82]  Frantisek Franek,et al.  More results on overlapping squares , 2012, J. Discrete Algorithms.

[83]  Raffaele Giancarlo,et al.  The Myriad Virtues of Suffix Trees , 2006 .

[84]  Johann van der Merwe,et al.  A survey on peer-to-peer key management for mobile ad hoc networks , 2007, CSUR.

[85]  Gang Chen,et al.  Lempel–Ziv Factorization Using Less Time & Space , 2008, Math. Comput. Sci..

[86]  William F. Smyth,et al.  The maximum number of of runs in a string , 2003, IWOCA 2007.

[87]  Jens Stoye,et al.  Finding Maximal Pairs with Bounded Gap , 1999, CPM.

[88]  Maxime Crochemore,et al.  On the Right-Seed Array of a String , 2011, COCOON.

[89]  Costas S. Iliopoulos,et al.  New complexity results for the k-covers problem , 2011, Inf. Sci..

[90]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[91]  Jamie Simpson Modified Padovan words and the maximum number of runs in a word , 2010, Australas. J Comb..

[92]  Wojciech Rytter,et al.  A Linear-Time Algorithm for Seeds Computation , 2011, SODA.

[93]  William F. Smyth,et al.  Fast Optimal Algorithms for Computing All the Repeats in a String , 2008, Stringology.

[94]  Wojciech Rytter,et al.  Squares, cubes, and time-space efficient string searching , 1995, Algorithmica.

[95]  Michael G. Main,et al.  Detecting leftmost maximal periodicities , 1989, Discret. Appl. Math..

[96]  Frantisek Franek,et al.  Computing all Repeats Using Suffix Arrays , 2003, J. Autom. Lang. Comb..

[97]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[98]  Gwénaël Richomme,et al.  Optimality of some algorithms to detect quasiperiodicities , 2010, Theor. Comput. Sci..

[99]  Mathieu Giraud,et al.  Not So Many Runs in Strings , 2008, LATA.

[100]  Lucian Ilie,et al.  A Simple Algorithm for Computing the Lempel Ziv Factorization , 2008, Data Compression Conference (dcc 2008).

[101]  Jean Berstel,et al.  Partial Words and a Theorem of Fine and Wilf , 1999, Theor. Comput. Sci..

[102]  Hideo Bannai,et al.  New Lower Bounds for the Maximum Number of Runs in a String , 2008, Stringology.

[103]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[104]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[105]  Costas S. Iliopoulos,et al.  String Regularities with Don't Cares , 2003, Nord. J. Comput..