Matching a Set of Patterns with Wildcards

Multi-pattern matching with wildcards is to find all the occurrences of a set of patterns with wildcards in a text. This problem arises in various fields, such as computational biology and network security. But the problem is not extensively studied as the single pattern case and there is no efficient algorithm for this problem. In this paper, we present efficient algorithms based on fast Fourier transforms. Let $P=\{p_1,\ldots,p_k\}$ be a set of patterns with wildcards where the total length of patterns is $|P|$, and a text $t$ of length $n$ over alphabet $a_1,\ldots,a_{\sigma}$. We present two algorithms for this problem where patterns are matched simultaneously. The first algorithm finds the matches of a small set of patterns in the text in $O(n\log |P|+nk)$ time. The words used in the algorithm are of size $k\lceil2\lg\sigma\rceil+\sum_{i=1}^k \lceil\lg|p_i|\rceil$ bits. The second one finds the matchings of patterns in the text in time $O(n\log |P|\log\sigma+nk)$ by computing the Hamming distance between the patterns and the text. The algorithm uses the words with $\sum_{i=1}^k \lceil\lg|p_i|\rceil$ bits. We also demonstrate an FFT implementation based on the modular arithmetic for machines with word size of 64 bits. Finally, we show that both algorithms can be easily parallelized and the parallelized algorithms are given as well.