Non-standard stringology: algorithms and complexity

Non-standard stringology concerns string matching problems, wherein a position in the “text” (of size n) matches one in the “pattern)) (of size m), based on very general relationships between the corresponding “symbols”. For example, string matching with don’t cares is a simple non-standard string matching prob.’ lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet X; these wildcards match ever-y symbol from Z. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions. Non-standard Basic String Matching: ● For three problems from this family — including string mat thing with don’t cares and its generalizations — we prove a lower bound of f2(~( IX 1)) convolutions, where the (increasing) function ~ depends on the problem. These results are proved in the boolean convolution model that we introduce here. We also match this bound by improving or adapting the bestknown algorithms to this model. ● In the RAM model we show, using reductions, that all of these problems encode a variant of truncated boolean convolution with integer parameter n. These reductions allow us to infer that any improvement to the best–known algorithms for these problems (in the RAM) will yield faster algorithms for solving parametrized truncated convolutions of the input vectors. *This research was partially supported by NSF/DARPA under grant number CCR-89-06949 and by NSF under grant number CCR-91-03953. t Cowat ~~titute of Mathematical science, 251 Mercer Street, New York, NY 10012, USA; muthu@cs.nyu.edu, 212998-3061. *IBM Resew& Division, T. J. Watson Research Center, P. o. Box 704, Yorktown Heights, NY 10598, USA; kpalem@watson.ibm.tom, 914-9849846. palemIIMheory.stanford.edu, 415-723-4405. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and Its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. STOC 94-5194 Montreal, Quebec, Canada @ 1994 ACM 0-89791 -663-8/84/0005..s.50 As we show here, the fastest algorithms for the latter problem derived by extending the scheme from [K089] uses 0(min{7r, JR}) convolutions. We also derive analogous results for eight other variants of non-standard string matching problems that are drawn from the following two families: nonstandard counting string matching (eg., variant of classical string mat thing, that involves counting number of mismatches at each text position), and nonstandard threshold string matching (in which the kmismatches problem is a basic example). Interestingly, all of the above results are derived using the structure of the “match graph” defined by the mat thing relation of the given inst ante of the nonstandard string matching problem, and its complement. It turns out that our lower bounds and reductions depend upon the sizes of the induced cliques in these graphs — in particular, the “dominating” cliques in the match graph, and the “clique edge covers/partitions” in its complement. We also provide improved deterministic and randomized algorithms, as well ae those with better expected running times for some non-standard string matching problems.

[1]  Chak-Kuen Wong,et al.  Bounds for the String Editing Problem , 1976, JACM.

[2]  Eugene L. Lawler,et al.  Approximate string matching in sublinear expected time , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[3]  S. Rao Kosaraju,et al.  Efficient Tree Pattern Matching (Preliminary Version) , 1989, FOCS 1989.

[4]  Ingo Wegener,et al.  The complexity of Boolean functions , 1987 .

[5]  Dany Breslauer,et al.  Dictionary-Matching on Unbounded Alphabets: Uniform Length Dictionaries , 1994, J. Algorithms.

[6]  Zvi Galil,et al.  Faster tree pattern matching , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[7]  Wojciech Rytter,et al.  Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[8]  Brian W. Kernighan,et al.  The UNIX™ programming environment , 1979, Softw. Pract. Exp..

[9]  Raffaele Giancarlo,et al.  Data structures and algorithms for approximate string matching , 1988, J. Complex..

[10]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[11]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[12]  Maxime Crochemore,et al.  Two-way string-matching , 1991, JACM.

[13]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[14]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[15]  S. Rao Kosaraju,et al.  Efficient tree pattern matching , 1989, 30th Annual Symposium on Foundations of Computer Science.

[16]  Gad M. Landau,et al.  Fast Parallel and Serial Multidimensional Aproximate Array Matching , 1991, Theor. Comput. Sci..

[17]  Theodore P. Baker A Technique for Extending Rapid Exact-Match String Matching to Arrays of More Than One Dimension , 1978, SIAM J. Comput..

[18]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[19]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[20]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[21]  R.S. Bird,et al.  Two Dimensional Pattern Matching , 1977, Inf. Process. Lett..

[22]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[23]  Howard J. Karloff Fast Algorithms for Approximately Counting Mismatches , 1993, Inf. Process. Lett..

[24]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[25]  I. Anderson Combinatorics of Finite Sets , 1987 .

[26]  Gary Benson,et al.  Alphabet independent two dimensional matching , 1992, STOC '92.

[27]  Raffaele Giancarlo,et al.  On the exact complexity of string matching , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[28]  Zvi Galil,et al.  A Lower Bound for Parallel String Matching , 1992, SIAM J. Comput..

[29]  Raffaele Giancarlo,et al.  On the Exact Complexity of String Matching: Lower Bounds , 1991, SIAM J. Comput..

[30]  Zvi Galil,et al.  Open Problems in Stringology , 1985 .

[31]  S. Muthukrishnan,et al.  String Matching Under a General Matching Relation , 1992, Inf. Comput..

[32]  Amihood Amir,et al.  Efficient 2-dimensional approximate matching of non-rectangular figures , 1991, SODA '91.

[33]  Zvi Galil,et al.  Truly alphabet-independent two-dimensional pattern matching , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[34]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[35]  Raffaele Giancarlo,et al.  On the Exact Complexity of String Matching: Upper Bounds , 1992, SIAM J. Comput..

[36]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..