Prediction of enzyme function based on 3D templates of evolutionarily important amino acids

BackgroundStructural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates – structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates.ResultsBased on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable.ConclusionThese data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.

[1]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[2]  Olivier Lichtarge,et al.  Recurrent use of evolutionary importance for functional annotation of proteins based on local structural similarity , 2006, Protein science : a publication of the Protein Society.

[3]  Janet M Thornton,et al.  Protein function prediction using local 3D templates. , 2005, Journal of molecular biology.

[4]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[5]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[6]  Allegra Via,et al.  pdbFun: mass selection and fast comparison of annotated PDB residues , 2005, Nucleic Acids Res..

[7]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[8]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[9]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[10]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[11]  A. M. Lisewski,et al.  Rapid detection of similarity in protein structure and function through contact metric distances , 2006, Nucleic acids research.

[12]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.

[13]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[14]  Robert B Russell,et al.  A model for statistical significance of local similarities in structure. , 2003, Journal of molecular biology.

[15]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  G J Kleywegt,et al.  Detection, delineation, measurement and display of cavities in macromolecular structures. , 1994, Acta crystallographica. Section D, Biological crystallography.

[18]  Olivier Lichtarge,et al.  ET viewer: an application for predicting and visualizing functional sites in protein structures , 2006, Bioinform..

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Adam Godzik,et al.  JAFA: a protein function annotation meta-server , 2006, Nucleic Acids Res..

[21]  Vladimir A. Ivanisenko,et al.  PDBSiteScan: a program for searching for active, binding and posttranslational modification sites in the 3D structures of proteins , 2004, Nucleic Acids Res..

[22]  Patricia C Babbitt,et al.  Divergence of function in the thioredoxin fold suprafamily: evidence for evolution of peroxiredoxins from a thioredoxin-like ancestor. , 2004, Biochemistry.

[23]  Michael Schroeder,et al.  Equivalent binding sites reveal convergently evolved interaction motifs , 2006, Bioinform..

[24]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[25]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[26]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[27]  L Rychlewski,et al.  From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions , 1999, Protein science : a publication of the Protein Society.

[28]  K. Kinoshita,et al.  Identification of protein functions from a molecular surface database, eF-site , 2004, Journal of Structural and Functional Genomics.

[29]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[30]  Kimmen Sjölander,et al.  Phylogenomic inference of protein molecular function: advances and challenges , 2004, Bioinform..

[31]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[32]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[33]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[34]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[35]  A. Dillmann Enzyme Nomenclature , 1965, Nature.

[36]  Vladimir A. Ivanisenko,et al.  PDBSite: a database of the 3D structure of protein functional sites , 2004, Nucleic Acids Res..

[37]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[38]  David R. Gilbert,et al.  Motif-based searching in TOPS protein topology databases , 1999, Bioinform..

[39]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[40]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[41]  Olivier Lichtarge,et al.  Prediction and confirmation of a site critical for effector regulation of RGS domain activity , 2001, Nature Structural Biology.

[42]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[43]  Cheryl H Arrowsmith,et al.  Enzyme genomics: Application of general enzymatic screens to discover new enzymes. , 2005, FEMS microbiology reviews.

[44]  Lydia E. Kavraki,et al.  Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs , 2004, Pacific Symposium on Biocomputing.

[45]  Russell L. Marsden,et al.  Progress of structural genomics initiatives: an analysis of solved target structures. , 2005, Journal of molecular biology.

[46]  M. Vidal,et al.  Structural genomics: A pipeline for providing structures for the biologist , 2002, Protein science : a publication of the Protein Society.

[47]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[48]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[49]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[50]  D. Brutlag,et al.  Highly specific protein sequence motifs for genome analysis. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[51]  R. Kolodny,et al.  Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. , 2006, Current opinion in structural biology.

[52]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[53]  Lydia E. Kavraki,et al.  Geometric Sieving: Automated Distributed Optimization of 3D Motifs for Protein Function Prediction , 2006, RECOMB.

[54]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[55]  Jie Liang,et al.  pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins , 2004, Nucleic Acids Res..

[56]  Christophe Combet,et al.  The SuMo server: 3D search for protein functional sites , 2005, Bioinform..

[57]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[58]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[59]  A. Valencia,et al.  Intrinsic errors in genome annotation. , 2001, Trends in genetics : TIG.

[60]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[61]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[62]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[63]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[64]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[65]  Miroslaw Cygler,et al.  The structural genomics experimental pipeline: Insights from global target lists , 2004, Proteins.

[66]  Joël Janin,et al.  High-throughput crystal-optimization strategies in the South Paris Yeast Structural Genomics Project: one size fits all? , 2005, Acta crystallographica. Section D, Biological crystallography.

[67]  Michael Y. Galperin,et al.  Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement, and operon disruption , 1998, Silico Biol..

[68]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[69]  G J Kleywegt,et al.  Recognition of spatial motifs in protein structures. , 1999, Journal of molecular biology.

[70]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[71]  Frances M. G. Pearl,et al.  Recognizing the fold of a protein structure , 2003, Bioinform..

[72]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[73]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .