Protein structure based prediction of catalytic residues

BackgroundWorldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation.ResultsWe explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods.ConclusionsWe found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

[1]  Gil Amitai,et al.  Network analysis of protein structures identifies functional residues. , 2004, Journal of molecular biology.

[2]  Ronald J. Williams,et al.  Statistical criteria for the identification of protein active sites using theoretical microscopic titration curves , 2005, Proteins.

[3]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[4]  John D. Westbrook,et al.  The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods , 2011, Journal of Structural and Functional Genomics.

[5]  Ashish V. Tendulkar,et al.  Functional sites in protein families uncovered via an objective and automated graph theoretic approach. , 2003, Journal of molecular biology.

[6]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[7]  C. Sander,et al.  Convergent evolution of similar enzymatic function on different protein folds: The hexokinase, ribokinase, and galactokinase families of sugar kinases , 1993, Protein science : a publication of the Protein Society.

[8]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[9]  Ruth Nussinov,et al.  Triggering loops and enzyme function: identification of loops that trigger and modulate movements. , 2003, Journal of molecular biology.

[10]  András Fiser,et al.  Conservation of amino acids in multiple alignments: aspartic acid has unexpected conservation , 1996, FEBS letters.

[11]  Jaime Prilusky,et al.  Automated analysis of interatomic contacts in proteins , 1999, Bioinform..

[12]  J. Thornton,et al.  Missing in action: enzyme functional annotations in biological databases. , 2009, Nature chemical biology.

[13]  Ying Wei,et al.  Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties , 2009, PLoS Comput. Biol..

[14]  V. Schramm,et al.  Enzymatic transition states and dynamic motion in barrier crossing. , 2009, Nature chemical biology.

[15]  Michael Lappe,et al.  Detection of protein catalytic residues at high precision using local network properties , 2008, BMC Bioinformatics.

[16]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[17]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[18]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[19]  Stephen K Burley,et al.  Structure of YqgQ protein from Bacillus subtilis, a conserved hypothetical protein. , 2010, Acta crystallographica. Section F, Structural biology and crystallization communications.

[20]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[21]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[22]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[23]  Robert B Russell,et al.  A model for statistical significance of local similarities in structure. , 2003, Journal of molecular biology.

[24]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..

[25]  Mallur S. Madhusudhan,et al.  DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins , 2011, Nucleic Acids Res..

[26]  Kai Wang,et al.  Incorporating background frequency improves entropy-based residue conservation measures , 2006, BMC Bioinform..

[27]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[28]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[29]  M. Eisenstein,et al.  Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. , 2005, Journal of molecular biology.

[30]  András Fiser,et al.  MMM: a sequence-to-structure alignment protocol , 2006, Bioinform..

[31]  Fredrik Johansson,et al.  A comparative study of conservation and variation scores , 2010, BMC Bioinformatics.

[32]  Sophie Sacquin-Mora,et al.  Locating the active sites of enzymes using mechanical properties , 2007, Proteins.

[33]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[34]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[35]  Jodi Basner,et al.  Computational and theoretical methods to explore the relation between enzyme dynamics and catalysis. , 2006, Chemical reviews.

[36]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[37]  Lukasz A. Kurgan,et al.  Accurate sequence-based prediction of catalytic residues , 2008, Bioinform..

[38]  Ian Sillitoe,et al.  FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies , 2011, Nucleic Acids Res..

[39]  Adam Godzik,et al.  Tolerating some redundancy significantly speeds up clustering of large protein databases , 2002, Bioinform..

[40]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[41]  Walter R. Gilks,et al.  Modeling the percolation of annotation errors in a database of protein sequences , 2002, Bioinform..

[42]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[43]  Gabriel del Rio,et al.  Improved prediction of critical residues for protein function based on network and phylogenetic analyses , 2005, BMC Bioinformatics.

[44]  Andrea Passerini,et al.  Automatic prediction of catalytic residues by modeling residue structural neighborhood , 2010, BMC Bioinformatics.

[45]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[46]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[47]  Mike P. Liang,et al.  Structural characterization of proteins using residue environments , 2005, Proteins.

[48]  Johannes Söding,et al.  HHsenser: exhaustive transitive profile search using HMM–HMM comparison , 2006, Nucleic Acids Res..

[49]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[50]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[51]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[52]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[53]  A. Fiser,et al.  The ybeY protein from Escherichia coli is a metalloprotein. , 2005, Acta crystallographica. Section F, Structural biology and crystallization communications.