Protein multiple sequence alignment.

Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Dan Gusfield,et al.  Parametric optimization of sequence alignment , 1992, SODA '92.

[3]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[4]  Yi Wang,et al.  An adaptive and iterative algorithm for refining multiple sequence alignment , 2004, Comput. Biol. Chem..

[5]  P. Argos,et al.  Determination of reliable regions in protein sequence alignments. , 1990, Protein engineering.

[6]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[7]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[8]  Yun S. Song,et al.  An Efficient Algorithm for Statistical Multiple Alignment on Arbitrary Phylogenetic Trees , 2003, J. Comput. Biol..

[9]  David Eppstein,et al.  Fast hierarchical clustering and other applications of dynamic closest pairs , 1999, SODA '98.

[10]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[11]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[12]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[13]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[14]  Arne Elofsson,et al.  Profile–profile methods provide improved fold‐recognition: A study of different profile–profile alignment methods , 2004, Proteins.

[15]  Lode Wyns,et al.  Align-m-a new algorithm for multiple alignment of highly divergent sequences , 2004, Bioinform..

[16]  An-Suei Yang,et al.  Structure-dependent sequence alignment for remotely related proteins , 2002, Bioinform..

[17]  S. Henikoff,et al.  Automated construction and graphical presentation of protein blocks from unaligned sequences. , 1995, Gene.

[18]  Thomas Lengauer,et al.  Arby: automatic protein structure prediction using profile-profile alignment and confidence measures , 2004, Bioinform..

[19]  Yaoqi Zhou,et al.  SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. , 2005, Bioinformatics.

[20]  Jimin Pei,et al.  PROMALS: towards accurate multiple sequence alignments of distantly related proteins , 2007, Bioinform..

[21]  Gaston H. Gonnet,et al.  Evaluation Measures of Multiple Sequence Alignments , 2000, J. Comput. Biol..

[22]  Eric Depiereux,et al.  Review of Common Sequence Alignment Methods: Clues to Enhance Reliability , 2003 .

[23]  Richard Hughey,et al.  Scoring hidden Markov models , 1997, Comput. Appl. Biosci..

[24]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[25]  Kevin Karplus,et al.  Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set , 2001, Bioinform..

[26]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[27]  Toshio Shimizu,et al.  Multiple Sequence Alignment Using a Genetic Algorithm , 1996 .

[28]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[29]  Ari Löytynoja,et al.  A hidden Markov model for progressive multiple alignment , 2003, Bioinform..

[30]  P. Pevzner,et al.  De Novo Repeat Classification and Fragment Assembly , 2004 .

[31]  William R. Pearson,et al.  Empirical determination of effective gap penalties for sequence comparison , 2002, Bioinform..

[32]  Andreas W. M. Dress,et al.  A Divide and Conquer Approach to Multiple Alignment , 1995, ISMB.

[33]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[34]  Desmond G. Higgins,et al.  Evaluation of iterative alignment algorithms for multiple alignment , 2005, Bioinform..

[35]  Jon M. Kleinberg,et al.  Fast detection of common geometric substructure in proteins , 1999, J. Comput. Biol..

[36]  Ralf Zimmer,et al.  Improving Profile-Profile Alignments via Log Average Scoring , 2001, WABI.

[37]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[38]  Knut Reinert,et al.  A polyhedral approach to sequence alignment problems , 2000, Discret. Appl. Math..

[39]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[40]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[41]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[42]  W. Pearson Empirical statistical estimates for sequence similarity searches. , 1998, Journal of molecular biology.

[43]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[44]  Jens Stoye,et al.  Combining Divide-and-Conquer, the A*-Algorithm, and Successive Realignment Approaches to Speed Multiple Sequence Alignment , 1999, German Conference on Bioinformatics.

[45]  Lior Pachter,et al.  Multiple alignment by sequence annealing , 2007, Bioinform..

[46]  Lode Wyns,et al.  Consistency matrices: Quantified structure alignments for sets of related proteins , 2003, Proteins.

[47]  M S Waterman,et al.  Consensus methods for DNA and protein sequence alignment. , 1990, Methods in enzymology.

[48]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[49]  J. Heringa,et al.  Homology-extended sequence alignment , 2005, Nucleic acids research.

[50]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[51]  J. Hein,et al.  Statistical alignment: computational properties, homology testing and goodness-of-fit. , 2000, Journal of molecular biology.

[52]  Toshio Shimizu,et al.  An Inspection of the Multiple Alignment Method with Use of a Genetic Algorithm , 1997 .

[53]  Chris Smith,et al.  Parameterization Studies for the SAM and HMMER Methods of Hidden Markov Model Generation , 1996, ISMB.

[54]  Olivier Poch,et al.  RASCAL: Rapid Scanning and Correction of Multiple Sequence Alignments , 2003, Bioinform..

[55]  István Miklós,et al.  An improved algorithm for statistical alignment of sequences related by a star tree , 2002, Bulletin of mathematical biology.

[56]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[57]  Ana Arribas-Gil,et al.  Parameter Estimation in Pair‐hidden Markov Models , 2005, math/0509280.

[58]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[59]  Moon-Jung Chung,et al.  Multiple sequence alignment using simulated annealing , 1994, Comput. Appl. Biosci..

[60]  Fabrice Armougom,et al.  The iRMSD: a local measure of sequence alignment accuracy using structural information , 2006, ISMB.

[61]  P. Argos,et al.  Motif recognition and alignment for many sequences by comparison of dot-matrices. , 1991, Journal of molecular biology.

[62]  Robert C. Edgar,et al.  Local homology recognition and distance measures in linear time using compressed amino acid alphabets. , 2004, Nucleic acids research.

[63]  L. Holm,et al.  Exhaustive enumeration of protein domain families. , 2003, Journal of molecular biology.

[64]  Ralf Zimmer,et al.  Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction , 2002, Pacific Symposium on Biocomputing.

[65]  V. Sundararajan,et al.  Multiple Sequence Alignment Using Parallel Genetic Algorithms , 1998, SEAL.

[66]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[67]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[68]  Christopher J. Lee,et al.  Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems , 2004, Bioinform..

[69]  Zoltán Toroczkai,et al.  An Improved Model for Statistical Alignment , 2001, WABI.

[70]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[71]  John D. Kececioglu,et al.  Aligning Alignments , 1998, CPM.

[72]  Thorsten Joachims,et al.  Support Vector Training of Protein Alignment Models , 2007, RECOMB.

[73]  Roque Moraes,et al.  SIMULTANEOUS SEQUENCE ALIGNMENT AND TREE CONSTRUCTION USING HIDDEN MARKOV MODELS , 2002 .

[74]  E. Lander,et al.  Parametric sequence comparisons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[76]  Desmond G. Higgins,et al.  Fast and sensitive multiple sequence alignments on a microcomputer , 1989, Comput. Appl. Biosci..

[77]  Cédric Notredame,et al.  Mocca: semi-automatic method for domain hunting , 2001, Bioinform..

[78]  Lior Pachter,et al.  Alignment Metric Accuracy , 2005, q-bio/0510052.

[79]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[80]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[81]  Paola Bonizzoni,et al.  The complexity of multiple sequence alignment with SP-score that is a metric , 2001, Theor. Comput. Sci..

[82]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[83]  Kazutaka Katoh,et al.  PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences , 2007, Bioinform..

[84]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[85]  Nick V. Grishin,et al.  Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments , 2003, Bioinform..

[86]  Osamu Gotoh,et al.  A weighting system and algorithm for aligning many phylogenetically related sequences , 1995, Comput. Appl. Biosci..

[87]  D. Morrison Multiple sequence alignment for phylogenetic purposes , 2006 .

[88]  Hiroshi Mamitsuka,et al.  Finding the biologically optimal alignment of multiple sequences , 2005, Artif. Intell. Medicine.

[89]  Richard Hughey,et al.  Weighting hidden Markov models for maximum discrimination , 1998, Bioinform..

[90]  Hiroshi Imai,et al.  Enhanced A* Algorithms for Multiple Alignments: Optimal Alignments for Several Sequences and k-Opt Approximate Alignments for Large Cases , 1999, Theoretical Computer Science.

[91]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[92]  Kimmen Sjölander,et al.  SATCHMO: Sequence Alignment and Tree Construction Using Hidden Markov Models , 2003, Bioinform..

[93]  De-Shuang Huang,et al.  Aligning multiple protein sequence by an improved genetic algorithm , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[94]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[95]  Jaap Heringa,et al.  PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information , 2005, Nucleic Acids Res..

[96]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[97]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[98]  István Miklós,et al.  Bayesian coestimation of phylogeny and sequence alignment , 2005, BMC Bioinformatics.

[99]  P. Argos,et al.  A method to recognize distant repeats in protein sequences , 1993, Proteins.

[100]  Jens Lagergren,et al.  Fast neighbor joining , 2005, Theor. Comput. Sci..

[101]  Winfried Just,et al.  Computational Complexity of Multiple Sequence Alignment with SP-Score , 2001, J. Comput. Biol..

[102]  Sean R. Eddy,et al.  Multiple Alignment Using Hidden Markov Models , 1995, ISMB.

[103]  C. S. Wallace,et al.  The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments , 1994, Journal of Molecular Evolution.

[104]  M. Sippl,et al.  Structure-derived substitution matrices for alignment of distantly related sequences. , 2000, Protein engineering.

[105]  Kimmen Sjölander,et al.  A comparison of scoring functions for protein sequence profile alignment , 2004, Bioinform..

[106]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[107]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[108]  Tu Minh Phuong,et al.  Multiple alignment of protein sequences with repeats and rearrangements , 2006, Nucleic acids research.

[109]  M. Miyamoto,et al.  Sequence alignments and pair hidden Markov models using evolutionary history. , 2003, Journal of molecular biology.

[110]  Olivier Poch,et al.  MACSIMS : multiple alignment of complete sequences information management system , 2006, BMC Bioinformatics.

[111]  Ian Holmes,et al.  Evolutionary HMMs: a Bayesian approach to multiple alignment , 2001, Bioinform..

[112]  Jun S. Liu,et al.  Gibbs motif sampling: Detection of bacterial outer membrane protein repeats , 1995, Protein science : a publication of the Protein Society.

[113]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[114]  Martin Vingron,et al.  Modeling Amino Acid Replacement , 2000, J. Comput. Biol..

[115]  X. Huang,et al.  On global sequence alignment , 1994, Comput. Appl. Biosci..

[116]  Burkhard Morgenstern,et al.  DIALIGN: multiple DNA and protein sequence alignment at BiBiServ , 2004, Nucleic Acids Res..

[117]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[118]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[119]  S Subbiah,et al.  A method for multiple sequence alignment with gaps. , 1989, Journal of molecular biology.

[120]  Jun Zhu,et al.  Bayesian adaptive sequence alignment algorithms , 1998, Bioinform..

[121]  Masato Ishikawa,et al.  Comprehensive study on iterative algorithms of multiple sequence alignment , 1995, Comput. Appl. Biosci..

[122]  Toshio Shimizu,et al.  Multiple Sequence Alignment by Genetic Algorithm , 2000 .

[123]  Jens Ledet Jensen,et al.  Recursions for statistical multiple alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[124]  Kimmen Sjölander,et al.  COACH : profile-profile alignment of protein families using hidden Markov models , 2003 .

[125]  S F Altschul,et al.  Generalized affine gap costs for protein sequence alignment , 1998, Proteins.

[126]  Mike A. Steel,et al.  Applying the Thorne-Kishino-Felsenstein model to sequence evolution on a star-shaped tree , 2001, Appl. Math. Lett..

[127]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[128]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[129]  G. Crooks,et al.  A generalized affine gap model significantly improves protein sequence alignment accuracy , 2004, Proteins.

[130]  Knut Reinert,et al.  The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment , 2000, J. Comput. Biol..

[131]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[132]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[133]  William R. Taylor,et al.  Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[134]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[135]  Ian Holmes,et al.  Dynamic Programming Alignment Accuracy , 1998, J. Comput. Biol..

[136]  Lode Wyns,et al.  SABmark- a benchmark for sequence alignment that covers the entire known fold space , 2005, Bioinform..

[137]  Ernest Feytmans,et al.  MATCH-BOX: a fundamentally new algorithm for the simultaneous alignment of several protein sequences , 1992, Comput. Appl. Biosci..

[138]  Elisabeth R. M. Tillier,et al.  The accuracy of several multiple sequence alignment programs for proteins , 2006, BMC Bioinformatics.

[139]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[140]  P. Briffeuil,et al.  Match-Box_server: a multiple sequence alignment tool placing emphasis on reliability , 1997, Comput. Appl. Biosci..

[141]  Hayato Yamana,et al.  Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost , 2006, BMC Bioinformatics.

[142]  Serafim Batzoglou,et al.  CONTRAlign: Discriminative Training for Protein Sequence Alignment , 2006, RECOMB.

[143]  John D. Kececioglu,et al.  Aligning alignments exactly , 2004, RECOMB.

[144]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[145]  MARTIN VINGRON,et al.  Towards Integration of Multiple Alignment and Phylogenetic Tree Construction , 1997, J. Comput. Biol..

[146]  Isaac Elias,et al.  Settling the Intractability of Multiple Alignment , 2003, ISAAC.

[147]  E. Marcotte,et al.  A fast algorithm for genome‐wide analysis of proteins with repeated sequences , 1999, Proteins.

[148]  Arndt von Haeseler,et al.  Simultaneous statistical multiple alignment and phylogeny reconstruction. , 2005, Systematic biology.

[149]  Michael Kaufmann,et al.  BMC Bioinformatics BioMed Central , 2005 .

[150]  I. Holmes,et al.  Using guide trees to construct multiple-sequence evolutionary HMMs , 2003, ISMB.

[151]  Dirk Metzler,et al.  Statistical alignment based on fragment insertion and deletion models , 2003, Bioinform..

[152]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[153]  J. Thompson,et al.  DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. , 2000, Nucleic acids research.

[154]  Dennis R. Livesay,et al.  Probalign: multiple sequence alignment using partition function posterior probabilities , 2006, Bioinform..

[155]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[156]  Hamilton O. Smith,et al.  Finding sequence motifs in groups of functionally related proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[157]  Patrice Koehl,et al.  MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences , 2005, Nucleic acids research.

[158]  A. C. May,et al.  Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics. , 2001, Protein engineering.

[159]  M. Bishop,et al.  Maximum likelihood alignment of DNA sequences. , 1986, Journal of molecular biology.

[160]  A. Phillips,et al.  Multiple sequence alignment in phylogenetic analysis. , 2000, Molecular phylogenetics and evolution.

[161]  Robert C. Edgar,et al.  Multiple sequence alignment. , 2006, Current opinion in structural biology.

[162]  Webb Miller,et al.  A space-efficient algorithm for local similarities , 1990, Comput. Appl. Biosci..

[163]  István Miklós Algorithm for statistical alignment of two sequences derived from a Poisson sequence length distribution , 2003, Discret. Appl. Math..

[164]  Tero Aittokallio,et al.  A statistical score for assessing the quality of multiple sequence alignments , 2006, BMC Bioinformatics.

[165]  Gajendra P. S. Raghava,et al.  OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy , 2003, BMC Bioinformatics.

[166]  J. L. Jensen,et al.  GIBBS SAMPLER FOR STATISTICAL MULTIPLE ALIGNMENT , 2005 .

[167]  Paul Horton Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Alignment of DNA and Protein Sequences , 2001, J. Comput. Biol..

[168]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[169]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[170]  I. Holmes,et al.  A "Long Indel" model for evolutionary sequence alignment. , 2003, Molecular biology and evolution.

[171]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[172]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[173]  Liisa Holm,et al.  Rapid automatic detection and alignment of repeats in protein sequences , 2000, Proteins.

[174]  J. Hein Unified approach to alignment and phylogenies. , 1990, Methods in enzymology.

[175]  Liming Cai,et al.  Evolutionary computation techniques for multiple sequence alignment , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[176]  Moritoshi Yasunaga,et al.  A parallel hybrid genetic algorithm for multiple protein sequence alignment , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[177]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[178]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[179]  Smith Rf,et al.  Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. , 1992 .

[180]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[181]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[182]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[183]  Anders Krogh,et al.  Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA , 1995, ISMB.

[184]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[185]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[186]  W. Bains,et al.  MULTAN: a program to align multiple DNA sequences , 1986, Nucleic Acids Res..

[187]  J. Stoye Multiple sequence alignment with the Divide-and-Conquer method. , 1998, Gene.

[188]  David Haussler,et al.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[189]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[190]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[191]  D. Slotboom,et al.  Hydropathy profile alignment: a tool to search for structural homologues of membrane proteins. , 1998, FEMS microbiology reviews.

[192]  Jaap Heringa,et al.  Tracking repeats using significance and transitivity , 2004, ISMB/ECCB.

[193]  N. Grishin,et al.  MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information , 2006, Nucleic acids research.

[194]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[195]  O. Gotoh,et al.  Multiple sequence alignment: algorithms and applications. , 1999, Advances in biophysics.

[196]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[197]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[198]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[199]  Jin-An Feng,et al.  NdPASA: A novel pairwise protein sequence alignment algorithm that incorporates neighbor‐dependent amino acid propensities , 2005, Proteins.

[200]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[201]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[202]  Jens Stoye,et al.  An iterative method for faster sum-of-pairs multiple sequence alignment , 2000, Bioinform..

[203]  S. Altschul Gap costs for multiple sequence alignment. , 1989, Journal of theoretical biology.

[204]  M Vingron,et al.  Weighting in sequence space: a comparison of methods in terms of generalized sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[205]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[206]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[207]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[208]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[209]  Pierre Baldi,et al.  Smooth On-Line Learning Algorithms for Hidden Markov Models , 1994, Neural Computation.

[210]  Jimin Pei,et al.  PCMA: fast and accurate multiple sequence alignment based on profile consistency , 2003, Bioinform..

[211]  Fabrice Armougom,et al.  Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee , 2006, Nucleic Acids Res..

[212]  Andrew D. Smith,et al.  SIMPROT: Using an empirically determined indel distribution in simulations of protein evolution , 2005, BMC Bioinformatics.

[213]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[214]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[215]  O. Gotoh Consistency of optimal sequence alignments. , 1990, Bulletin of Mathematical Biology.

[216]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[217]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[218]  G J Barton,et al.  Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.

[219]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[220]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[221]  Anna R. Panchenko,et al.  Refining multiple sequence alignments with conserved core regions , 2006, Nucleic acids research.

[222]  Michael S. Waterman Parametric and ensemble sequence alignment algorithms , 1994 .

[223]  D. Higgins,et al.  Multiple sequence alignments. , 2005, Current opinion in structural biology.

[224]  Kevin Karplus,et al.  A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[225]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[226]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[227]  S Karlin,et al.  A symmetric-iterated multiple alignment of protein sequences. , 1998, Journal of molecular biology.

[228]  Akito Taneda,et al.  A Web Server for Multiple Sequence Alignment Using Genetic Algorithm , 2001 .

[229]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[230]  Aurélien Grosdidier,et al.  APDB: a novel measure for benchmarking sequence alignment methods without reference alignments , 2003, ISMB.

[231]  Jens Stoye,et al.  Improving the Divide-and-Conquer Approach to Sum-of-Pairs Multiple Sequence Alignment , 1997 .

[232]  Anders Krogh,et al.  Chapter 4 - An introduction to hidden Markov models for biological sequences , 1998 .

[233]  Andrew K. C. Wong,et al.  A genetic algorithm for multiple molecular sequence alignment , 1997, Comput. Appl. Biosci..

[234]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[235]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[236]  Jaap Heringa,et al.  Global multiple‐sequence alignment with repeats , 2006, Proteins.

[237]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[238]  Kurt Mehlhorn,et al.  A branch-and-cut algorithm for multiple sequence alignment , 1997, RECOMB '97.

[239]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[240]  Jens Stoye,et al.  DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment , 1997, Comput. Appl. Biosci..

[241]  J. Richardson,et al.  Simultaneous comparison of three protein sequences. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[242]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[243]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[244]  Juan Seijas,et al.  Multiple protein sequence comparison by genetic algorithms , 1998, Defense, Security, and Sensing.

[245]  John D. Kececioglu,et al.  Simple and Fast Inverse Alignment , 2006, RECOMB.

[246]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[247]  Günther R. Raidl,et al.  An Evolutionary Algorithm for the Maximum Weight Trace Formulation of the Multiple Sequence Alignment Problem , 2004, PPSN.

[248]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[249]  S. Pongor,et al.  A normalized root‐mean‐spuare distance for comparing protein three‐dimensional structures , 2001, Protein science : a publication of the Protein Society.

[250]  M Ishikawa,et al.  Multiple sequence alignment by parallel simulated annealing , 1993, Comput. Appl. Biosci..

[251]  Christopher J. Lee Generating Consensus Sequences from Partial Order Multiple Sequence Alignment Graphs , 2003, Bioinform..

[252]  W. Taylor A flexible method to align large numbers of biological sequences , 2005, Journal of Molecular Evolution.