An overview of multiple sequence alignment.

Multiple sequence alignment is perhaps the most commonly applied bioinformatics technique. It often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. In this unit, an overview of multiple sequence alignment techniques is presented, covering a history of nearly 30 years from the early pioneering methods to the current state-of-the-art techniques. Methodological and biological issues and end-user considerations, as well as alignment evaluation issues, are discussed.

[1]  W. Fitch An improved method of testing for evolutionary homology. , 1966, Journal of molecular biology.

[2]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[3]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[4]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[5]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[6]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[7]  M. O. Dayhoff,et al.  Establishing homologies in protein sequences. , 1983, Methods in enzymology.

[8]  Paulien Hogeweg,et al.  Energy directed folding of RNA sequences , 1984, Nucleic Acids Res..

[9]  James W. Fickett,et al.  Fast optimal alignment , 1984, Nucleic Acids Res..

[10]  J. Richardson,et al.  Simultaneous comparison of three protein sequences. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M S Waterman,et al.  Multiple sequence alignment by consensus. , 1986, Nucleic acids research.

[12]  H. M. Martinez,et al.  A multiple sequence alignment program , 1986, Nucleic Acids Res..

[13]  O. Gotoh Alignment of three biological sequences with an efficient traceback procedure. , 1986, Journal of theoretical biology.

[14]  S F Altschul,et al.  A nonlinear measure of subalignment similarity and its significance levels. , 1986, Bulletin of mathematical biology.

[15]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[16]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[17]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[18]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[19]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[20]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[21]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[22]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[23]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[24]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[25]  M S Waterman,et al.  Consensus methods for DNA and protein sequence alignment. , 1990, Methods in enzymology.

[26]  N. Saitou,et al.  Maximum likelihood methods. , 1990, Methods in enzymology.

[27]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[28]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[29]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[30]  M Vingron,et al.  Weighting in sequence space: a comparison of methods in terms of generalized sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[31]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[32]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[33]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[34]  P. Bucher,et al.  Improving the sensitivity of the sequence profile method , 1994, Protein science : a publication of the Protein Society.

[35]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[36]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[37]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[38]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[39]  Kevin Karplus,et al.  A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[40]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[41]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[42]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[43]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[45]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[46]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[47]  W R Taylor,et al.  Three-dimensional domain duplication, swapping and stealing. , 1997, Current opinion in structural biology.

[48]  J. Stoye Multiple sequence alignment with the Divide-and-Conquer method. , 1998, Gene.

[49]  J Heringa,et al.  Detection of internal repeats: how common are they? , 1998, Current opinion in structural biology.

[50]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[51]  Biological sequence analysis: Profile HMMs for sequence families , 1998 .

[52]  Jun Zhu,et al.  Bayesian adaptive sequence alignment algorithms , 1998, Bioinform..

[53]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[54]  S Karlin,et al.  A symmetric-iterated multiple alignment of protein sequences. , 1998, Journal of molecular biology.

[55]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[56]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[57]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[58]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[59]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[60]  Jaap Heringa,et al.  Two Strategies for Sequence Comparison: Profile-preprocessed and Secondary Structure-induced Multiple Alignment , 1999, Comput. Chem..

[61]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[62]  Gaston H. Gonnet,et al.  Evaluation Measures of Multiple Sequence Alignments , 2000, J. Comput. Biol..

[63]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[64]  Martin Vingron,et al.  Modeling Amino Acid Replacement , 2000, J. Comput. Biol..

[65]  Jens Stoye,et al.  An iterative method for faster sum-of-pairs multiple sequence alignment , 2000, Bioinform..

[66]  Kevin Karplus,et al.  Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set , 2001, Bioinform..

[67]  T. D. Schneider,et al.  Consensus sequence Zen. , 2002, Applied bioinformatics.

[68]  Jaap Heringa,et al.  Parallelized multiple alignment , 2002, Bioinform..

[69]  Erik L L Sonnhammer,et al.  Quality assessment of multiple alignment programs , 2002, FEBS letters.

[70]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[71]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[72]  Jaap Heringa,et al.  Local Weighting Schemes for Protein Multiple Sequence Alignment , 2002, Comput. Chem..

[73]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[74]  Kuang Lin,et al.  Testing homology with Contact Accepted mutatiOn (CAO): a contact-based Markov model of protein evolution , 2003, Comput. Biol. Chem..

[75]  Russell F. Doolittle,et al.  A method for the simultaneous alignment of three or more amino acid sequences , 2005, Journal of Molecular Evolution.

[76]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.

[77]  A. Hinnebusch,et al.  Computer comparison of new and existing criteria for constructing evolutionary trees from sequence data , 2005, Journal of Molecular Evolution.

[78]  W. Taylor A flexible method to align large numbers of biological sequences , 2005, Journal of Molecular Evolution.

[79]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[80]  M. Waterman,et al.  Comparative biosequence metrics , 2005, Journal of Molecular Evolution.