The Complexity of Pattern Matching for a Random String

We study the average-case complexity of finding all occurrences of a given pattern $\alpha$ in an input text string. Over an alphabet of q symbols, let c($\alpha$,n) be the minimum average number of characters that need to be examined in a random text string of length n. We prove that, for large m, almost all patterns $\alpha$ of length m satisfy c($\alpha$,n) = $\Theta (\lceil \log_q (${n-m}/{ln m} + 2)\rceil )$ if $m \leq n \leq 2m$, and c($\alpha$,n) = $\Theta ({\lceil \log_q m\rceil}/m n)$ if n < 2m. This in particular confirms a conjecture raised in a recent paper by Knuth, Morris, and Pratt [1977].