Optimal sequence alignment using affine gap costs

When comparing two biological sequences, it is often desirable for a gap to be assigned a cost not directly proportional to its length. If affine gap costs are employed, in other words if opening a gap costsv and each null in the gap costsu, the algorithm of Gotoh (1982,J. molec. Biol.162, 705) finds the minimum cost of aligning two sequences in orderMN steps. Gotoh's algorithm attempts to find only one from among possibly many optimal (minimum-cost) alignments, but does not always succeed. This paper provides an example for which this part of Gotoh's algorithm fails and describes an algorithm that finds all and only the optimal alignments. This modification of Gotoh's algorithm still requires orderMN steps. A more precise form of path graph than previously used is needed to represent accurately all optimal alignments for affine gap costs.

[1]  Philip Taylor,et al.  A fast homology program for aligning biological sequences , 1984, Nucleic Acids Res..

[2]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[3]  T. Taniguchi,et al.  Structure and expression of a cloned cDNA for human interleukin-2 , 1983, Nature.

[4]  S F Altschul,et al.  A nonlinear measure of subalignment similarity and its significance levels. , 1986, Bulletin of mathematical biology.

[5]  T. Smith,et al.  Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[6]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[7]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[8]  K. Arai,et al.  Use of a cDNA expression vector for isolation of mouse interleukin 2 cDNA clones: expression of T-cell growth-factor activity after transfection of monkey cells. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M S Waterman,et al.  Efficient sequence alignment algorithms. , 1984, Journal of theoretical biology.

[10]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[11]  Esko Ukkonen,et al.  On Approximate String Matching , 1983, FCT.