Matching with mismatches and assorted applications
暂无分享,去创建一个
This thesis consists of three parts, each of independent interest, yet tied
together by the problem of matching with mismatches. In the first chapter,
we present a motivated exposition of a new randomized algorithm
for indexed matching with mismatches which, for constant error (substitution)
rates, locates a substring of length m within a string of length n
faster than existing algorithms by a factor of O(m/ log(n)).The second chapter turns from this theoretical problem to an entirely
practical concern: delta compression of executable code. In contrast to
earlier work which has either generated very large deltas when applied to
executable code, or has generated small deltas by utilizing platform and
processor-specific knowledge, we present a naive approach — that is, one
which does not rely upon any external knowledge — which nevertheless
constructs deltas of size comparable to those produced by a platformspecific
approach. In the course of this construction, we utilize the result
from the first chapter, although it is of primary utility only when producing
deltas between very similar executables.
The third chapter lies between the horn and ivory gates, being both highly
interesting from a theoretical viewpoint and of great practical value. Using
the algorithm for matching with mismatches from the first chapter,
combined with error correcting codes, we give a practical algorithm for
“universal” delta compression (often called “feedback-free file synchronization”)
which can operate in the presence of multiple indels and a
large number of substitutions.