A Sequence Alignment Algorithm with an Arbitrary Gap Penalty Function

An algorithm for aligning biological sequences is presented that is an adaptation of the sequence generating function approach used in the statistical mechanics of biopolymers. This algorithm uses recursion relationships developed from a partition function formalism of alignment probabilities. It is implemented within a dynamic programming format that closely resembles the forward algorithm used in hidden Markov models (HMM). The algorithm aligns sequences or structures according to the statistically dominant alignment path and will be referred to as the SDP algorithm. An advantage of this method over previous ones is that it allows more complicated and physically realistic gap penalty functions to be incorporated into the algorithm in a facile manner. The performance of this algorithm in a case study of aligning the heavy and light chain from the variable region of an immunoglobulin is investigated.

[1]  L M Amzel,et al.  Preliminary refinement and structural analysis of the Fab fragment from human immunoglobulin new at 2.0 A resolution. , 1981, The Journal of biological chemistry.

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  Rolf Olsen,et al.  Optimizing Smith-Waterman Alignments , 1999, Pacific Symposium on Biocomputing.

[4]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[5]  Mark S. Boguski,et al.  Similarity and Homology , 1991 .

[6]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[7]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[8]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[9]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Sidney Redner,et al.  Distribution functions in the interior of polymer chains , 1980 .

[11]  E. Myers,et al.  Sequence comparison with concave weighting functions. , 1988, Bulletin of mathematical biology.

[12]  Gō Mitiko,et al.  Statistical Mechanics of Biopolymers and Its Application to the Melting Transition of Polynucleotides , 1967 .

[13]  M. O. Dayhoff,et al.  Establishing homologies in protein sequences. , 1983, Methods in enzymology.

[14]  T. Gregory Dewey,et al.  Order–disorder transitions in finite biopolymers: A sequence generating function approach , 1994 .

[15]  G J Barton,et al.  Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.

[16]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[17]  X. Huang,et al.  Dynamic programming algorithms for restriction map comparison , 1992, Comput. Appl. Biosci..

[18]  D. Turner,et al.  RNA structure prediction. , 1988, Annual review of biophysics and biophysical chemistry.

[19]  Shneior Lifson,et al.  Partition Functions of Linear‐Chain Molecules , 1964 .