On Near-Optimal Alignments of Biological Sequences

A near-optimal alignment between a pair of sequences is an alignment whose score lies within the neighborhood of the optimal score. We present an efficient method for representing all alignments whose score is within any given delta from the optimal score. The representation is a compact graph that makes it easy to impose additional biological constraints and select one desirable alignment from the large set of alignments. We study the combinatorial nature of near-optimal alignments, and define a set of "canonical" near-optimal alignments. We then show how to enumerate near-optimal alignments efficiently in order of their score, and count their number. When applied to comparisons of two distantly related proteins, near-optimal alignments reveal that the most conserved regions among the near-optimal alignments are the highly structured regions in the proteins. We also show that by counting the number of near optimal alignments as a function of the distance from the optimal score, we can select a good set of parameters that best constraints the biologically relevant alignments.

[1]  Michael S. Waterman,et al.  A dynamic programming algorithm to find all solutions in a neighborhood of the optimum , 1985 .

[2]  M. Pollack Letter to the Editor—The kth Best Route Through a Network , 1961 .

[3]  Aarni Perko,et al.  Implementation of algorithms for K shortest loopless paths , 1986, Networks.

[4]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[5]  T. T. Wu,et al.  AN ANALYSIS OF THE SEQUENCES OF THE VARIABLE REGIONS OF BENCE JONES PROTEINS AND MYELOMA LIGHT CHAINS AND THEIR IMPLICATIONS FOR ANTIBODY COMPLEMENTARITY , 1970, The Journal of experimental medicine.

[6]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[7]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[8]  M S Waterman,et al.  Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[9]  A M Lesk,et al.  The evolution of protein structures. , 1987, Cold Spring Harbor symposia on quantitative biology.

[10]  M J Sternberg,et al.  A simple method to generate non-trivial alternate alignments of protein sequences. , 1991, Journal of molecular biology.

[11]  Douglas R. Shier,et al.  On algorithms for finding the k shortest paths in a network , 1979, Networks.

[12]  Stuart E. Dreyfus,et al.  An Appraisal of Some Shortest-Path Algorithms , 1969, Oper. Res..

[13]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[14]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[15]  Richard Pavley,et al.  A Method for the Solution of the Nth Best Path Problem , 1959, JACM.

[16]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[17]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M S Waterman,et al.  A local algorithm for DNA sequence alignment with inversions. , 1992, Bulletin of mathematical biology.

[19]  Toshihide Ibaraki,et al.  An efficient algorithm for K shortest simple paths , 1982, Networks.

[20]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[21]  T. Smith,et al.  Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[22]  E. Lander,et al.  Parametric sequence comparisons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[23]  P. Argos,et al.  Determination of reliable regions in protein sequence alignments. , 1990, Protein engineering.

[24]  Peter H. Sellers,et al.  An Algorithm for the Distance Between Two Finite Sequences , 1974, J. Comb. Theory, Ser. A.