Bounds for parametric sequence comparison

Abstract We consider the problem of computing a global alignment between two or more sequences subject to varying mismatch and indel penalties. We prove a tight 3(n/2π) 2/3 +O(n 1/3 log n) bound on the worst-case number of distinct optimum alignments for two sequences of length n as the parameters are varied. This refines a O(n2/3) upper bound by Gusfield et al., answering a question posed by Pevzner and Waterman. Our lower bound requires an unbounded alphabet. For strings over a binary alphabet, we prove a Ω(n 1/2 ) lower bound. For the parametric global alignment of k⩾2 sequences under sum-of-pairs scoring we prove a 3(( k 2 )n/2π) 2/3 +O(k 2/3 n 1/3 log n) upper bound on the number of distinct optimality regions and a Ω(n 2/3 ) lower bound, partially answering a problem of Pevzner. Based on experimental evidence, we conjecture that for two random sequences, the number of optimality regions is approximately n with high probability.

[1]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[2]  E. Lander,et al.  Parametric sequence comparisons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Dan Gusfield,et al.  Parametric optimization of sequence alignment , 1992, SODA '92.

[4]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  Petra Mutzel,et al.  Computational Molecular Biology , 1996 .

[6]  P.A. Pevzner,et al.  Open combinatorial problems in computational molecular biology , 1995, Proceedings Third Israel Symposium on the Theory of Computing and Systems.

[7]  P. Pevzner,et al.  Computational Molecular Biology , 2000 .

[8]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[9]  D Gusfield,et al.  Parametric and inverse-parametric sequence alignment with XPARAL. , 1996, Methods in enzymology.

[10]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[11]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[12]  T. Apostol Introduction to analytic number theory , 1976 .

[13]  Pavel A. Pevzner,et al.  Parametric Recomuting in Alignment Graphs , 1994, CPM.

[14]  David Fernández-Baca,et al.  Parametric Multiple Sequence Alignment and Phylogeny Construction , 2000, CPM.