Structure-derived substitution matrices for alignment of distantly related sequences.

Sequence alignment is a standard method to infer evolutionary, structural, and functional relationships among sequences. The quality of alignments depends on the substitution matrix used. Here we derive matrices based on superimpositions from protein pairs of similar structure, but of low or no sequence similarity. In a performance test the matrices are compared with 12 other previously published matrices. It is found that the structure-derived matrices are applicable for comparisons of distantly related sequences. We investigate the influence of evolutionary relationships of protein pairs on the alignment accuracy.

[1]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[5]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[6]  J. Risler,et al.  Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. , 1988, Journal of molecular biology.

[7]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[10]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[11]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[13]  John P. Overington,et al.  A structural basis for sequence comparisons. An evaluation of scoring methodologies. , 1993, Journal of molecular biology.

[14]  P. Argos,et al.  An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. , 1995, Journal of molecular biology.

[15]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[16]  S. Wodak,et al.  Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins. , 1995, Protein engineering.

[17]  M J Sippl,et al.  Optimum superimposition of protein structures: ambiguities and implications. , 1996, Folding & design.

[18]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[19]  H. Wolfson,et al.  Amino acid pair interchanges at spatially conserved locations. , 1996, Journal of molecular biology.

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[22]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[23]  M J Sippl,et al.  Protein folds from pair interactions: A blind test in fold recognition , 1997, Proteins.

[24]  R Sánchez,et al.  Evaluation of comparative protein structure modeling by MODELLER‐3 , 1997, Proteins.

[25]  M J Sternberg,et al.  Recognition of analogous and homologous protein folds--assessment of prediction success and associated alignment accuracy using empirical substitution matrices. , 1998, Protein engineering.

[26]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide , 1999, Bioinform..

[27]  M Gerstein,et al.  Advances in structural genomics. , 1999, Current opinion in structural biology.

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.