Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution.

Detection of recurring three-dimensional side-chain patterns is a potential means of inferring protein function. This paper presents a new method for detecting such patterns and discusses various implications. The method allows detection of side-chain patterns without any prior knowledge of function, requiring only protein structure data and associated multiple sequence alignments. A recursive, depth-first search algorithm finds all possible groups of identical amino acids common to two protein structures independent of sequence order. The search is highly constrained by distance constraints, and by ignoring amino acids unlikely to be involved in protein function. A weighted root-mean-square deviation (RMSD) between equivalenced groups of amino acids is used as a measure of similarity. The statistical significance of any RMSD is assigned by reference to a distribution fitted to simulated data. Searches with the Ser/His/Asp catalytic triad, a His/His porphyrin binding pattern, and the zinc-finger Cys/Cys/His/His pattern are performed to test the method on known examples. An all-against-all comparison of representatives from the structural classification of proteins (SCOP) is performed, revealing several new examples of evolutionary convergence to common patterns of side-chains within different tertiary folds and in different orders along the sequence. These include a di-zinc binding Asp/Asp/His/His/Ser pattern common to alkaline phosphatase/bacterial aminopeptidase, and an Asp/Glu/His/His/Asn/Asn pattern common to the active sites of DNase I and endocellulase E1. Implications for protein evolution, function prediction and the rational design of functional regulators are discussed.

[1]  D. Blow,et al.  The study of alpha-chymotrypsin by x-ray diffraction. The Third CIBA Medal Lecture. , 1969, The Biochemical journal.

[2]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[3]  A. Mclachlan Three-fold structural pattern in the soybean trypsin inhibitor (Kunitz). , 1979, Journal of molecular biology.

[4]  F S Mathews,et al.  The structure, function and evolution of cytochromes. , 1985, Progress in biophysics and molecular biology.

[5]  Dietrich Suck,et al.  Structure of DNase I at 2.0 Å resolution suggests a mechanism for binding to and cutting DNA , 1986, Nature.

[6]  A. Lesk,et al.  Determinants of a protein fold. Unique features of the globin amino acid sequences. , 1987, Journal of molecular biology.

[7]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Gribskov,et al.  [9] Profile analysis , 1990 .

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  J. Berg Zinc finger domains: hypotheses and current knowledge. , 1990, Annual review of biophysics and biophysical chemistry.

[11]  D. Blow More of the catalytic triad , 1990, Nature.

[12]  R. F. Smith,et al.  Automatic generation of primary sequence patterns from sets of related protein sequences. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[13]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[14]  S. Withers,et al.  Glutamic acid 274 is the nucleophile in the active site of a "retaining" exoglucanase from Cellulomonas fimi. , 1991, The Journal of biological chemistry.

[15]  A Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1992, Nucleic acids research.

[16]  D. Suck,et al.  X-ray structure of the DNase I-d(GGTATACC)2 complex at 2.3 A resolution. , 1992, Journal of molecular biology.

[17]  J M Thornton,et al.  Towards an understanding of the arginine-aspartate interaction. , 1992, Journal of molecular biology.

[18]  Andrea Musacchio,et al.  Crystal structure of a Src-homology 3 (SH3) domain , 1992, Nature.

[19]  Structural similarity of the binding sites of cyclophilin A-cyclosporin A and FKBP-FK506 systems. , 1993, Biochemical and biophysical research communications.

[20]  S. Crennell,et al.  Crystal structure of a bacterial sialidase (from Salmonella typhimurium LT2) shows the same fold as an influenza virus neuraminidase. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[21]  W Brandt,et al.  Classification of serine proteases derived from steric comparisons of their active sites. , 1993, Drug design and discovery.

[22]  S. Withers,et al.  Glu280 is the nucleophile in the active site of Clostridium thermocellum CelC, a family A endo-beta-1,4-glucanase. , 1993, The Journal of biological chemistry.

[23]  A. Murzin Can homologous proteins evolve different enzymatic activities? , 1993, Trends in biochemical sciences.

[24]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[25]  Peter J. Artymiuk,et al.  A tale of two synthetases , 1994, Nature Structural Biology.

[26]  B Chevrier,et al.  Crystal structure of Aeromonas proteolytica aminopeptidase: a prototypical member of the co-catalytic zinc enzyme family. , 1994, Structure.

[27]  W Brandt,et al.  Classification of serine proteases derived from steric comparisons of their active sites, part II: "Ser, His, Asp arrangements in proteolytic and nonproteolytic proteins". , 1994, Drug design and discovery.

[28]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[29]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[30]  B. Matthews,et al.  Three-dimensional structure of β-galactosidase from E. coli. , 1994, Nature.

[31]  R. Nussinov,et al.  Three‐dimensional, sequence order‐independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: Potential implications to evolution and to protein folding , 1994, Protein science : a publication of the Protein Society.

[32]  J M Thornton,et al.  Structural similarity between the pleckstrin homology domain and verotoxin: The problem of measuring and evaluating structural similarity , 1995, Protein science : a publication of the Protein Society.

[33]  Peter Willett,et al.  β—Glucosyltransferase and phosphorylase reveal their common theme , 1995, Nature Structural Biology.

[34]  J. Endicott,et al.  The cell cycle and suc1: from structure to function? , 1995, Structure.

[35]  C Sander,et al.  Evolutionary link between glycogen phosphorylase and a DNA modifying enzyme. , 1995, The EMBO journal.

[36]  F. Barras,et al.  Informational suppression to investigate structural functional and evolutionary aspects of the Erwinia chrysanthemi cellulase EGZ. , 1995, Journal of molecular biology.

[37]  Zbigniew Dauter,et al.  A common protein fold and similar active site in two distinct families of β-glycanases , 1996, Nature Structural Biology.

[38]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[39]  S. Crennell,et al.  The three domains of a bacterial sialidase: a beta-propeller, an immunoglobulin module and a galactose-binding jelly-roll. , 1995, Structure.

[40]  John A. Tainer,et al.  Structure and function of the multifunctional DNA-repair enzyme exonuclease III , 1995, Nature.

[41]  M. Czjzek,et al.  Crystal structure of the catalytic domain of a bacterial cellulase belonging to family 5. , 1995, Structure.

[42]  N. Rawlings,et al.  Families and clans of serine peptidases. , 1995, Archives of biochemistry and biophysics.

[43]  Robert B. Russell,et al.  Protein fold recognition from secondary structure assignments , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[44]  M. Hackert,et al.  Structural analysis of monomeric hemichrome and dimeric cyanomet hemoglobins from Caudina arenicola. , 1994, Journal of molecular biology.

[45]  William N. Lipscomb,et al.  Recent Advances in Zinc Enzymology. , 1996, Chemical reviews.

[46]  A G Murzin,et al.  Structural classification of proteins: new superfamilies. , 1996, Current opinion in structural biology.

[47]  Zbigniew Dauter,et al.  Bacterial chitobiase structure provides insight into catalytic mechanism and the basis of Tay–Sachs disease , 1996, Nature Structural Biology.

[48]  G. Barton,et al.  Protein fold recognition by mapping predicted secondary structures. , 1996, Journal of molecular biology.

[49]  P. Freemont,et al.  Does this have a familiar RING? , 1996, Trends in biochemical sciences.

[50]  F. Guerlesquin,et al.  Crystal structure of a dimeric octaheme cytochrome c3 (M(r) 26,000) from Desulfovibrio desulfuricans Norway. , 1996, Structure.

[51]  M. Himmel,et al.  Crystal structure of thermostable family 5 endocellulase E1 from Acidothermus cellulolyticus in complex with cellotetraose. , 1996, Biochemistry.

[52]  S. Karlin,et al.  Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. , 1996, Nucleic acids research.

[53]  L. Delbaere,et al.  Crystal structure of Escherichia coli phosphoenolpyruvate carboxykinase: a new structural family with the P-loop nucleoside triphosphate hydrolase fold. , 1996, Journal of molecular biology.

[54]  T J Gibson,et al.  PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. , 1996, Nucleic acids research.

[55]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[56]  Richard Hughey,et al.  Scoring hidden Markov models , 1997, Comput. Appl. Biosci..

[57]  J. Berg,et al.  Lessons from zinc-binding peptides. , 1997, Annual review of biophysics and biomolecular structure.

[58]  C. Wolberger,et al.  The 1.6 A crystal structure of the AraC sugar-binding and dimerization domain complexed with D-fucose. , 1997, Journal of molecular biology.

[59]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[60]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[61]  C Sander,et al.  New structure--novel fold? , 1997, Structure.

[62]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[63]  G. Ramponi,et al.  Structural, catalytic, and functional properties of low M(r), phosphotyrosine protein phosphatases. Evidence of a long evolutionary history. , 1997, The international journal of biochemistry & cell biology.

[64]  M. Sternberg,et al.  Modelling protein docking using shape complementarity, electrostatics and biochemical information. , 1997, Journal of molecular biology.

[65]  D. McRee,et al.  Structure of Haemophilus influenzae Fe+3-binding protein reveals convergent evolution within a superfamily , 1997, Nature Structural Biology.

[66]  Chris Sander,et al.  Decision Support System for the Evolutionary Classification of Protein Structures , 1997, ISMB.

[67]  J. Tainer,et al.  The crystal structure of the human DNA repair endonuclease HAP1 suggests the recognition of extra‐helical deoxyribose at DNA abasic sites , 1997, The EMBO journal.

[68]  Keizo Inoue,et al.  Brain acetylhydrolase that inactivates platelet-activating factor is a G-protein-like trimer , 1997, Nature.

[69]  David C. Jones,et al.  Contemporary approaches to protein structure classification , 1998, BioEssays : news and reviews in molecular, cellular and developmental biology.

[70]  E. Pennisi Taking a Structured Approach to Understanding Proteins , 1998, Science.

[71]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..