Comparative analysis of multiple protein-sequence alignment methods.

We have analyzed a total of 12 different global and local multiple protein-sequence alignment methods. The purpose of this study is to evaluate each method's ability to correctly identify the ordered series of motifs found among all members of a given protein family. Four phylogenetically distributed sets of sequences from the hemoglobin, kinase, aspartic acid protease, and ribonuclease H protein families were used to test the methods. The performance of all 12 methods was affected by (1) the number of sequences in the test sets, (2) the degree of similarity among the sequences, and (3) the number of indels required to produce a multiple alignment. Global methods generally performed better than local methods in the detection of motif patterns.

[1]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[2]  T. L. Blundell,et al.  Structural evidence for gene duplication in the evolution of the acid proteases , 1978, Nature.

[3]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  L. J. Korn,et al.  New approaches for computer analysis of nucleic acid sequences. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Waterman,et al.  Line geometries for sequence comparisons , 1984 .

[8]  M S Waterman,et al.  Multiple sequence alignment by consensus. , 1986, Nucleic acids research.

[9]  M. A. McClure,et al.  Computer analysis of retroviral pol genes: assignment of enzymatic functions to specific sequences and homologies with nonviral enzymes. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[10]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[11]  William R. Taylor,et al.  A structural model for the retroviral proteases , 1987, Nature.

[12]  J. Mohana Rao New scoring matrix for amino acid residue exchanges based on residue characteristic physical parameters. , 1987, International journal of peptide and protein research.

[13]  A. Lesk,et al.  Determinants of a protein fold. Unique features of the globin amino acid sequences. , 1987, Journal of molecular biology.

[14]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[15]  G J Barton,et al.  Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.

[16]  William R. Taylor,et al.  Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[17]  J. Risler,et al.  Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. , 1988, Journal of molecular biology.

[18]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[19]  S. Goff,et al.  Domain structure of the Moloney murine leukemia virus reverse transcriptase: mutational analysis and separate expression of the DNA polymerase and RNase H activities. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[20]  H. M. Martinez A flexible multiple sequence alignment program. , 1988, Nucleic acids research.

[21]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[22]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[23]  M. A. McClure,et al.  Origins and Evolutionary Relationships of Retroviruses , 1989, The Quarterly Review of Biology.

[24]  Maria Miller,et al.  Crystal structure of a retroviral protease proves relationship to aspartic protease family , 1989, Nature.

[25]  S Subbiah,et al.  A method for multiple sequence alignment with gaps. , 1989, Journal of molecular biology.

[26]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[27]  K. Morikawa,et al.  Three-dimensional structure of ribonuclease H from E. coli , 1990, Nature.

[28]  M S Waterman,et al.  Consensus methods for DNA and protein sequence alignment. , 1990, Methods in enzymology.

[29]  R. F. Smith,et al.  Automatic generation of primary sequence patterns from sets of related protein sequences. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[30]  J. Zheng,et al.  Crystal structure of the catalytic subunit of cyclic adenosine monophosphate-dependent protein kinase. , 1991, Science.

[31]  P. Argos,et al.  Motif recognition and alignment for many sequences by comparison of dot-matrices. , 1991, Journal of molecular biology.

[32]  S. Hanks,et al.  Protein kinase catalytic domain sequence database: identification of conserved features of primary structure and classification of family members. , 1991, Methods in enzymology.

[33]  Mark S. Boguski,et al.  Similarity and Homology , 1991 .

[34]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[35]  M. A. McClure Sequence analysis of eukaryotic retroid proteins , 1992 .

[36]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[37]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[38]  D. K. Y. Chiu,et al.  A survey of multiple sequence comparison methods , 1992 .

[39]  David J. Lipman,et al.  MULTIPLE ALIGNMENT , COMMUNICATION COST , AND GRAPH MATCHING * , 1992 .

[40]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[41]  D Gusfield,et al.  Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993, Bulletin of mathematical biology.

[42]  D. Gusfield Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .