Multiple alignment, communication cost, and graph matching

Multiple sequence alignment is an important problem in computational molecular biology. Dynamic programming for optimal multiple alignment requires too much time to be practical. Although many algorithms for suboptimal alignment have been suggested, no “performance guarantees” algorithms have been known until recently. A computationally efficient approximation multiple alignment algorithm with guaranteed error bounds equal to the normalized communication cost of a corresponding graph is given in this paper. Recently, Altschul and Lipman [SIAM J. Appl. Math., 49 (1989), pp. 197–209] used suboptimal alignments for reducing the computational complexity of the optimal alignment algorithm. This paper develops the Altschul–Lipman approach and demonstrates that bounds for optimal multiple alignment of k sequences can be derived from a solution of the maximum weighted matching problem in a k-vertex graph. Fast maximum matching algorithms allow efficient implementation of dynamic bounds for the multiple alignment ...

[1]  Abraham Charnes,et al.  Programming with linear fractional functionals , 1962 .

[2]  Michel Balinski,et al.  Integer Programming: Methods, Uses, Computations , 1965 .

[3]  T. C. Hu Optimum Communication Spanning Trees , 1974, SIAM J. Comput..

[4]  L. Lovász 2-Matchings and 2-covers of hypergraphs , 1975 .

[5]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[6]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[7]  Harold N. Gabow,et al.  An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs , 1976, JACM.

[8]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Jan Karel Lenstra,et al.  The complexity of the network design problem , 1978, Networks.

[11]  David Maier,et al.  On Finding Minimal Length Superstrings , 1980, J. Comput. Syst. Sci..

[12]  Cary Queen,et al.  Improvements to a program for DNA analysis: a procedure to find homologies among many sequences , 1982, Nucleic Acids Res..

[13]  M. Fredman,et al.  Algorithms for computing evolutionary similarity measures with length independent gap penalties , 1984 .

[14]  M. Waterman,et al.  Line geometries for sequence comparisons , 1984 .

[15]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[16]  M. Waterman,et al.  Pattern recognition in several sequences: consensus and alignment. , 1984, Bulletin of mathematical biology.

[17]  J. Richardson,et al.  Simultaneous comparison of three protein sequences. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M S Waterman,et al.  Multiple sequence alignment by consensus. , 1986, Nucleic acids research.

[19]  W. Bains,et al.  MULTAN: a program to align multiple DNA sequences , 1986, Nucleic Acids Res..

[20]  M Levitt,et al.  Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. , 1986, Protein engineering.

[21]  H. M. Martinez,et al.  A multiple sequence alignment program , 1986, Nucleic Acids Res..

[22]  D. Bacon,et al.  Multiple Sequence Alignment , 1986, Journal of molecular biology.

[23]  O. Gotoh Alignment of three biological sequences with an efficient traceback procedure. , 1986, Journal of theoretical biology.

[24]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[25]  V. V. S. Murty,et al.  New lower planes for the network design problem , 1987, Networks.

[26]  K. Rohde,et al.  A multiple alignment program for protein sequences , 1987, Comput. Appl. Biosci..

[27]  R. K. Ahuja,et al.  Exact and Heuristic Algorithms for the Optimum Communication Spanning Tree Problem , 1987, Transp. Sci..

[28]  L. Patthy,et al.  Detecting homology of distantly related proteins with consensus sequences. , 1987, Journal of molecular biology.

[29]  K. Tajima Multiple DNA and protein sequence alignment on a workstation and a supercomputer , 1988, Comput. Appl. Biosci..

[30]  S Karlin,et al.  Efficient algorithms for molecular sequence analysis. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Esko Ukkonen,et al.  A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings , 1988, Theor. Comput. Sci..

[32]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[33]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[34]  Mauno Vihinen,et al.  An algorithm for simultaneous comparison of several sequences , 1988, Comput. Appl. Biosci..

[35]  H. M. Martinez A flexible multiple sequence alignment program. , 1988, Nucleic acids research.

[36]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[37]  R J Roberts,et al.  Predictive motifs derived from cytosine methyltransferases. , 1989, Nucleic acids research.

[38]  Jonathan S. Turner,et al.  Approximation Algorithms for the Shortest Common Superstring Problem , 1989, Inf. Comput..

[39]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[40]  J. Spouge Speeding up dynamic programming algorithms for finding optimal lattice paths , 1989 .

[41]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[42]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[43]  S. Altschul Gap costs for multiple sequence alignment. , 1989, Journal of theoretical biology.

[44]  D. Lipman,et al.  Trees, stars, and multiple biological sequence alignment , 1989 .

[45]  S Subbiah,et al.  A method for multiple sequence alignment with gaps. , 1989, Journal of molecular biology.

[46]  J Hein,et al.  A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. , 1989, Molecular biology and evolution.

[47]  Hamilton O. Smith,et al.  Finding sequence motifs in groups of functionally related proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[48]  D E Foulser,et al.  Parallel computation of multiple biological sequence comparisons. , 1990, Computers and biomedical research, an international journal.

[49]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[50]  Philippe Dessen,et al.  MASH: an interactive program for multiple alignment and consensus sequence construction for biological sequences , 1991, Comput. Appl. Biosci..

[51]  P Argos,et al.  Protein sequence comparison: methods and significance. , 1991, Protein engineering.

[52]  Tao Jiang,et al.  Linear approximation of shortest superstrings , 1991, STOC '91.

[53]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[54]  S Karlin,et al.  An efficient algorithm for identifying matches with errors in multiple long molecular sequences. , 1991, Journal of molecular biology.

[55]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[56]  Mikhail A. Roytberg A search for common patterns in many sequences , 1992, Comput. Appl. Biosci..

[57]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[58]  D. Eppstein,et al.  Efficient Algorithms for Sequence Analysis , 1993 .

[59]  D. Gusfield Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .

[60]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.