We consider the problem of separating two distinct classes of k similar sequences of length n over an alphabet of size s that have been optimally multi-aligned. An objective function based on minimizing the consensus score of the separated halves is introduced and we present an O(k3n) heuristic algorithm and two optimal branch-and-bound algorithms for the problem. The branch-and-bound algorithms involve progressively more powerful lower bound functions for pruning the O(2k) search tree. The simpler lower bound takes O(n) time to evaluate given O(sn) global data structures and the stronger bound takes O((k+s)n) time by virtue of an efficient solution to the problem of finding the second-maximum envelope of a set of piece-wise affine curves. In a series of empirical trials we establish the degree to which classes can be separated using our metric and the effective pruning efficiency of the two branch-and-bound algorithms.
[1]
X. Huang,et al.
CAP3: A DNA sequence assembly program.
,
1999,
Genome research.
[2]
Eugene W. Myers,et al.
Toward Simplifying and Accurately Formulating Fragment Assembly
,
1995,
J. Comput. Biol..
[3]
E. Myers,et al.
Sequence comparison with concave weighting functions.
,
1988,
Bulletin of mathematical biology.
[4]
오병균,et al.
[서평]「Computer Algorithms/C++」
,
1998
.
[5]
Eugene W. Myers,et al.
ReAligner: a program for refining DNA sequence multi-alignments
,
1997,
RECOMB '97.
[6]
Dan Gusfield,et al.
Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology
,
1997
.
[7]
P. Green,et al.
Consed: a graphical tool for sequence finishing.
,
1998,
Genome research.