Finding Patterns with Variable Length Gaps or Don't Cares

In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max$_{\rm 1<={\it i}<={\it l}}$(bi–ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don’t cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α loglogn) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts.

[1]  Richard Cole,et al.  Verifying candidate matches in sparse and wildcard matching , 2002, STOC '02.

[2]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[3]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[4]  Tatsuya Akutsu Approximate String Matching with Variable Length Don't Care Characters , 1996 .

[5]  Gad M. Landau,et al.  Efficient String Matching with k Mismatches , 2018, Theor. Comput. Sci..

[6]  Moshe Lewenstein,et al.  Faster algorithms for string matching with k mismatches , 2000, SODA '00.

[7]  Dong Kyue Kim,et al.  Linear-Time Construction of Suffix Arrays , 2003, CPM.

[8]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[9]  Costas S. Iliopoulos,et al.  Finding Approximate Occurrences of a Pattern That Contains Gaps , 2003 .

[10]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[11]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[12]  Peter van Emde Boas,et al.  Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[13]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[14]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[15]  Z Galil,et al.  Improved string matching with k mismatches , 1986, SIGA.

[16]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[17]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[18]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[19]  Uzi Vishkin,et al.  Efficient approximate and dynamic matching of patterns using a labeling paradigm , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[20]  Gonzalo Navarro,et al.  Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching , 2003, J. Comput. Biol..