A Linear-Time Algorithm for the 1-Mismatch Problem

For sequence alignments (which can be viewed simply as rectangular arrays of characters), a frequent need is to identify regions, each consisting of a run of consecutive columns, that have some particular property. The 1-mismatch problem is to locate all maximal regions in a given alignment for which there exists a (not necessarily unique) “center” sequence such that inside the region alignment rows are within Hamming distance 1 from the center. We first describe some properties of these regions and their centers, and then use these properties to construct an algorithm that for a dxn alignment runs in time θ(nd) and extra space θ(d) (beyond that needed for the storage of the alignment itself).