Solving the String Statistics Problem in Time O(n log n)

The string statistics problem consists of preprocessing a string of length n such that given a query pattern of length m, the maximum number of non-overlapping occurrences of the query pattern in the string can be reported efficiently. Apostolico and Preparata introduced the minimal augmented suffix tree (MAST) as a data structure for the string statistics problem, and showed how to construct the MAST in time \( \mathcal{O} \) (nlog2 n) and how it supports queries in time \( \mathcal{O} \) (m) for constant sized alphabets. A subsequent theorem by Fraenkel and Simpson stating that a string has at most a linear number of distinct squares implies that the MAST requires space \( \mathcal{O} \) (n). In this paper we improve the construction time for the MAST to \( \mathcal{O} \) (nlogn) by extending the algorithm of Apostolico and Preparata to exploit properties of efficient joining and splitting of search trees together with a refined analysis.

[1]  Christian N. S. Pedersen,et al.  Solving the String Statistics Problem in Time O(n log n) , 2002 .

[2]  Jens Stoye,et al.  Finding Maximal Pairs with Bounded Gap , 1999, CPM.

[3]  M. Schützenberger,et al.  The equation $a^M=b^Nc^P$ in a free group. , 1962 .

[4]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[5]  Aviezri S. Fraenkel,et al.  How Many Squares Can a String Contain? , 1998, J. Comb. Theory, Ser. A.

[6]  Z. Ésik,et al.  Equational Axioms for Probabilistic Bisimilarity (Preliminary Report) , 2002 .

[7]  Jens Stoye,et al.  Simple and flexible detection of contiguous repeats using a suffix tree , 2002, Theor. Comput. Sci..

[8]  Robert E. Tarjan,et al.  A Fast Merging Algorithm , 1979, JACM.

[9]  Kurt Mehlhorn,et al.  A new data structure for representing sorted lists , 1980, Acta Informatica.

[10]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[11]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[12]  Christian N. S. Pedersen,et al.  Finding Maximal Quasiperiodicities in Strings , 1999, CPM.

[13]  F. Jones There and back again , 1989, Nature.

[14]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[15]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[16]  Martin Farach-Colton,et al.  Optimal Suffix Tree Construction with Large Alphabets , 1997, FOCS.

[17]  Frank K. Hwang,et al.  A Simple Algorithm for Merging Two Disjoint Linearly-Ordered Sets , 1972, SIAM J. Comput..

[18]  Anna Ingólfsdóttir,et al.  A Formalization of Linkage Analysis , 2002 .

[19]  Ronald Cramer,et al.  Optimal Black-Box Secret Sharing over Arbitrary Abelian Groups , 2002, CRYPTO.

[20]  Kurt Mehlhorn,et al.  Sorting Jordan Sequences in Linear Time Using Level-Linked Search Trees , 1986, Inf. Control..

[21]  Aske Simon Christensen,et al.  Extending Java for high-level Web service construction , 2002, TOPL.

[22]  U. Kohlenbach Uniform asymptotic regularity for Mann iterates , 2002 .

[23]  Kurt Mehlhorn,et al.  Sorting and Searching , 1984 .

[24]  Andrzej Ehrenfeucht,et al.  Efficient Detection of Quasiperiodicities in Strings , 1993, Theor. Comput. Sci..