Efficient Parallel and Serial Approximate String Matching

Consider the string matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences. The first part of the parallel algorithm consists of analysis of the pattern and takes O(log m) time using m 2 processors. The rest of the algorithm consists of handling the text. The text handling part applies the following new approach. This part starts by obtaining a concise characterization of the text which is based solely on substrings of the pattern in O(log m) time using n / log m processors. Then the desired output is derived from this characterization together with the tables built in the first part in O(k) time using n processors. The serial algorithm follows also this new approach for handling the text. It runs in O(kn) time for alphabet whose size is fixed. For general input the algorithm requires O(n(k + log m) ) time. In both cases the space requirement is O(n).

[1]  Robert E. Tarjan,et al.  An Efficient Parallel Biconnectivity Algorithm , 2011, SIAM J. Comput..

[2]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[3]  Uzi Vishkin,et al.  An optimal parallel connectivity algorithm , 1984, Discret. Appl. Math..

[4]  A G Ivanov RECOGNITION OF AN APPROXIMATE OCCURRENCE OF WORDS ON A TURING MACHINE IN REAL TIME , 1985 .

[5]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[6]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[7]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[8]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[9]  Zvi Galil,et al.  Open Problems in Stringology , 1985 .

[10]  Gad M. Landau,et al.  An efficient string matching algorithm with k differences for nucleotide and amino acid sequences , 2018, Nucleic Acids Res..

[11]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..

[12]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[13]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[14]  Uzi Vishkin,et al.  Optimal Parallel Pattern Matching in Strings , 2017, Inf. Control..

[15]  Gad M. Landau,et al.  Efficient string matching in the presence of errors , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[16]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[17]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[18]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[19]  J. Seiferas,et al.  Efficient and Elegant Subword-Tree Construction , 1985 .

[20]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[21]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[22]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[23]  Uzi Vishkin,et al.  On Efficient Parallel Strong Orientation , 1985, Inf. Process. Lett..

[24]  Zvi Galil,et al.  Optimal parallel algorithms for string matching , 1984, STOC '84.

[25]  Gad M. Landau,et al.  Efficient String Matching with k Mismatches , 2018, Theor. Comput. Sci..