Deterministic sampling—a new technique for fast pattern matching

Consider the following threestage strategy for recognizing patterns in larger scenes: Mimic randomization deterministically: Sample several positions of the pattern. Search for sample: Find all occurrences of the sample in the scene. Verify: For each occurrence of the sample, verify occurrence of the full pattern. This strategy led to the core of our new idea. Consider the string matching problem. Given the pattern, we select carefully a sample of its positions, whose size is at most logarithmic (the deterministic sample). Then, we search for the sample. For nonperiodic patterns, the sample has the following perhaps surprising property. It is possible to disqualify all occurrences of the sample positions but one, within each "neighborhood" of locations in the text, without any further comparisons of characters. This provides sparse verification. This approach enables to perform the text analysis (stages "search for sample" and "verify") in O (log*n) time and optimal speed-up on a PRAM. This improves on the previous fastest optimal speed-up result. It also leads to a new linear time serial algorithm for string matching. We expect the approach to be applicable for pragmatic pattern recognition problems. t The research o f this author was suppor ted by N S F grants CCR-8615337 and CCR-8906949 and ONR grant N00014-85-

[1]  Arthur Gill,et al.  Minimum-scan pattern recognition , 1959, IRE Trans. Inf. Theory.

[2]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[3]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[4]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[5]  Ronald L. Rivest On the Worst-Case Behavior of String-Searching Algorithms , 1977, SIAM J. Comput..

[6]  Leonard M. Adleman,et al.  Two theorems on random polynomial time , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[7]  M. Fischer,et al.  Parallel Prefix Computation , 1980, J. ACM.

[8]  Uzi Vishkin,et al.  Finding the Maximum, Merging, and Sorting in a Parallel Computation Model , 1981, J. Algorithms.

[9]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..

[10]  Faith Ellen,et al.  Relations between concurrent-write models of parallel computation , 1984, PODC '84.

[11]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[12]  Zvi Galil,et al.  Open Problems in Stringology , 1985 .

[13]  Uzi Vishkin,et al.  Optimal Parallel Pattern Matching in Strings , 2017, Inf. Control..

[14]  Zvi Galil Optimal Parallel Algorithms for String Matching , 1985, Inf. Control..

[15]  Richard Cole,et al.  Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms , 1986, STOC '86.

[16]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[17]  Quentin F. Stout,et al.  Constant-time geometry on PRAMS , 1988 .

[18]  M. Luby Removing randomness in parallel computation without a processor penalty , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[19]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 1989, Inf. Comput..

[20]  Uzi Vishkin,et al.  Recursive *-tree parallel data-structure , 1989, 30th Annual Symposium on Foundations of Computer Science.

[21]  Uzi Vishkin,et al.  Highly parallelizable problems , 1989, STOC '89.