Analysis of protein sequence/structure similarity relationships.

Current analyses of protein sequence/structure relationships have focused on expected similarity relationships for structurally similar proteins. To survey and explore the basis of these relationships, we present a general sequence/structure map that covers all combinations of similarity/dissimilarity relationships and provide novel energetic analyses of these relationships. To aid our analysis, we divide protein relationships into four categories: expected/unexpected similarity (S and S(?)) and expected/unexpected dissimilarity (D and D(?)) relationships. In the expected similarity region S, we show that trends in the sequence/structure relation can be derived based on the requirement of protein stability and the energetics of sequence and structural changes. Specifically, we derive a formula relating sequence and structural deviations to a parameter characterizing protein stiffness; the formula fits the data reasonably well. We suggest that the absence of data in region S(?) (high structural but low sequence similarity) is due to unfavorable energetics. In contrast to region S, region D(?) (high sequence but low structural similarity) is well-represented by proteins that can accommodate large structural changes. Our analyses indicate that there are several categories of similarity relationships and that protein energetics provide a basis for understanding these relationships.

[1]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[2]  F E Cohen,et al.  Protein model structure evaluation using the solvation free energy of folding. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[4]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[6]  Lukas,et al.  Analysis of the functional coupling between Calmodulin's calcium binding and peptide recognition properties , 1999, Biochemistry.

[7]  J. Kraut,et al.  Crystal structures of human DNA polymerase beta complexed with DNA: implications for catalytic mechanism, processivity, and fidelity. , 1996, Biochemistry.

[8]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[10]  Miguel A. Andrade-Navarro,et al.  Automated genome sequence analysis and annotation , 1999, Bioinform..

[11]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[12]  R. Abagyan,et al.  Do aligned sequences share the same fold? , 1997, Journal of molecular biology.

[13]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[14]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[15]  G. Zaccai,et al.  How soft is a protein? A protein dynamics force constant measured by neutron scattering. , 2000, Science.

[16]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[17]  Samuel H. Wilson,et al.  Crystal structures of human DNA polymerase beta complexed with gapped and nicked DNA: evidence for an induced fit mechanism. , 1997, Biochemistry.

[18]  D Eisenberg,et al.  Refined structure of monomelic diphtheria toxin at 2.3 Å resolution , 1994, Protein science : a publication of the Protein Society.

[19]  L. Delbaere,et al.  Trifluoperazine-induced conformational change in Ca2+-calmodulin , 1994, Nature Structural Biology.

[20]  T. Davis,et al.  Calmodulins with deletions in the central helix functionally replace the native protein in yeast cells. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J M Thornton,et al.  Using the CATH domain database to assign structures and functions to the genome sequences. , 2000, Biochemical Society transactions.

[22]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  David Eisenberg,et al.  Refined structure of dimeric diphtheria toxin at 2.0 Å resolution , 1994, Protein science : a publication of the Protein Society.

[24]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[25]  W. Pearson,et al.  Evolution of protein sequences and structures. , 1999, Journal of molecular biology.

[26]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.