Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

Strings with don't care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that $n$ longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word.

[1]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[2]  Costas S. Iliopoulos,et al.  String Regularities with Don't Cares , 2003, Nord. J. Comput..

[3]  Francine Blanchet-Sadri,et al.  Suffix Trees for Partial Words and the Longest Common Compatible Prefix Problem , 2013, LATA.

[4]  Wojciech Rytter,et al.  A note on the longest common compatible prefix problem for partial words , 2015, J. Discrete Algorithms.

[5]  Ryan Williams,et al.  A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[6]  William F. Smyth,et al.  Algorithms on indeterminate strings , 2003 .

[7]  Mohammad Sohel Rahman,et al.  Inferring an indeterminate string from a prefix graph , 2015, J. Discrete Algorithms.

[8]  Raphaël Clifford,et al.  Simple deterministic wildcard matching , 2007, Inf. Process. Lett..

[9]  Amir Abboud,et al.  Tight Hardness Results for LCS and Other Sequence Similarity Measures , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[10]  Robert E. Tarjan,et al.  A linear-time algorithm for a special case of disjoint set union , 1983, J. Comput. Syst. Sci..

[11]  Piotr Indyk,et al.  Faster algorithms for string matching problems: matching the convolution bound , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[12]  Russell Impagliazzo,et al.  Complexity of k-SAT , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[13]  Shu Wang,et al.  New Perspectives on the Prefix Array , 2008, SPIRE.

[14]  William F. Smyth,et al.  An Adaptive Hybrid Pattern-Matching Algorithm on Indeterminate Strings , 2009, Int. J. Found. Comput. Sci..

[15]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[16]  Costas S. Iliopoulos,et al.  Efficient (δ, γ)-pattern-matching with don't cares , 2009 .

[17]  Wojciech Rytter,et al.  Extracting powers and periods in a word from its runs structure , 2014, Theor. Comput. Sci..

[18]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[19]  Dong Ren,et al.  7th InternatIonal ConferenCe , 2011 .

[20]  Francine Blanchet-Sadri,et al.  Partial words and a theorem of Fine and Wilf revisited , 2002, Theor. Comput. Sci..

[21]  Shu Wang,et al.  Fast pattern-matching on indeterminate strings , 2008, J. Discrete Algorithms.

[22]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.

[23]  Russell Impagliazzo,et al.  Which problems have strongly exponential complexity? , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[24]  Shu Wang,et al.  Indeterminate strings, prefix arrays & undirected graphs , 2014, Theor. Comput. Sci..

[25]  Richard Cole,et al.  Verifying candidate matches in sparse and wildcard matching , 2002, STOC '02.