String Searching

Suppose we are given a sequence of nucleotides, such as ACGCGCAGGCA, and we wish to find all occurrences of that string in the human genome, about 3,000,000,000 bases long. How long would that take? C Without pre-processing, it would take at least 3,000,000,000 steps, because we need at least to look at every entry in the human genome. C With pre-processing, perhaps we can set up some sort of indexing system to more rapidly search for these strings. Indeed we can! The resulting search, after pre-processing, would take just 11 steps.