CSmetaPred: a consensus method for prediction of catalytic residues

BackgroundKnowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc.ResultsBoth meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization.ConclusionsThe benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/.

[1]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[2]  Akira Kimura,et al.  Crystal structure of gamma-glutamylcysteine synthetase: insights into the mechanism of catalysis by a key enzyme for glutathione homeostasis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Johannes Söding,et al.  Prediction of protein functional residues from sequence by probability density estimation , 2008, Bioinform..

[4]  Yong-Zi Chen,et al.  An improved prediction of catalytic residues in enzyme structures. , 2008, Protein engineering, design & selection : PEDS.

[5]  Felice C. Lightstone,et al.  Rapid Catalytic Template Searching as an Enzyme Function Prediction Procedure , 2013, PloS one.

[6]  Tsuyoshi Kato,et al.  Discriminative structural approaches for enzyme active-site prediction , 2011, BMC Bioinformatics.

[7]  M. Kochańczyk,et al.  Prediction of functionally important residues in globular proteins from unusual central distances of amino acids , 2011, BMC Structural Biology.

[8]  J. Weissenbach,et al.  Revealing the hidden functional diversity of an enzyme family. , 2014, Nature chemical biology.

[9]  Manish Datt,et al.  Redox-dependent stability of the γ-glutamylcysteine synthetase enzyme of Escherichia coli: a novel means of redox regulation. , 2013, The Biochemical journal.

[10]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[11]  Irena Roterman-Konieczna,et al.  Prediction of Functional Sites Based on the Fuzzy Oil Drop Model , 2007, PLoS Comput. Biol..

[12]  Simon Mitternacht,et al.  A geometry-based generic predictor for catalytic and allosteric sites. , 2011, Protein engineering, design & selection : PEDS.

[13]  Peter Murray-Rust,et al.  MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms , 2006, Nucleic Acids Res..

[14]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[15]  Dennis R. Livesay,et al.  How accurate and statistically robust are catalytic site predictions based on closeness centrality? , 2007, BMC Bioinformatics.

[16]  András Fiser,et al.  Protein structure based prediction of catalytic residues , 2013, BMC Bioinformatics.

[17]  Jun Wang,et al.  L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-logreg Classifier , 2012, PloS one.

[18]  Lukasz A. Kurgan,et al.  Accurate sequence-based prediction of catalytic residues , 2008, Bioinform..

[19]  M Hendlich,et al.  LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[20]  Shao-Wei Huang,et al.  On the relationship between catalytic residues and their protein contact number. , 2011, Current protein & peptide science.

[21]  Gisele L. Pappa,et al.  GASS: identifying enzyme active sites with genetic algorithms , 2015, Bioinform..

[22]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[23]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[24]  Ying Wei,et al.  Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties , 2009, PLoS Comput. Biol..

[25]  Shao-Wei Huang,et al.  EXIA2: Web Server of Accurate and Rapid Protein Catalytic Residue Prediction , 2014, BioMed research international.

[26]  Janet M. Thornton,et al.  The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes , 2013, Nucleic Acids Res..

[27]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[28]  J. Warwicker,et al.  Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. , 2004, Journal of molecular biology.

[29]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[30]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[31]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[32]  Darby Tien-Hao Chang,et al.  E1DS: catalytic site prediction based on 1D signatures of concurrent conservation , 2008, Nucleic Acids Res..

[33]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[34]  Shao-Wei Huang,et al.  Deriving protein dynamical properties from weighted protein contact number , 2008, Proteins.

[35]  Dennis R. Livesay,et al.  Predicting functional sites with an automated algorithm suitable for heterogeneous datasets , 2005, BMC Bioinformatics.

[36]  Karen N. Allen,et al.  Structure and Catalysis in the Escherichia coli Hotdog-fold Thioesterase Paralogs YdiI and YbdB , 2014, Biochemistry.

[37]  Kai Wang,et al.  Protein Meta-Functional Signatures from Combining Sequence, Structure, Evolution, and Amino Acid Property Information , 2008, PLoS Comput. Biol..

[38]  Mary Jo Ondrechen,et al.  Prediction of distal residue participation in enzyme catalysis , 2015, Protein science : a publication of the Protein Society.

[39]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[40]  Michael I. Jordan,et al.  Active site prediction using evolutionary and structural information , 2010, Bioinform..

[41]  Anna Tramontano,et al.  A resource for benchmarking the usefulness of protein structure models , 2012, BMC Bioinformatics.

[42]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[43]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[44]  Robert D. Finn,et al.  Predicting active site residue annotations in the Pfam database , 2007, BMC Bioinformatics.

[45]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..

[46]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[47]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[48]  Bingding Huang,et al.  MetaPocket: a meta approach to improve protein ligand binding site prediction. , 2009, Omics : a journal of integrative biology.

[49]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[50]  Andrea Passerini,et al.  Automatic prediction of catalytic residues by modeling residue structural neighborhood , 2010, BMC Bioinformatics.

[51]  M. Eisenstein,et al.  Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. , 2005, Journal of molecular biology.

[52]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[53]  Shao-Wei Huang,et al.  Accurate Prediction of Protein Catalytic Residues by Side Chain Orientation and Residue Contact Density , 2012, PloS one.

[54]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[55]  Walter P. Abhayaratna,et al.  Effects of Changes in Adiposity and Physical Activity on Preadolescent Insulin Resistance: The Australian LOOK Longitudinal Study , 2012, PloS one.

[56]  E. Biterova,et al.  Mechanistic Details of Glutathione Biosynthesis Revealed by Crystal Structures of Saccharomyces cerevisiae Glutamate Cysteine Ligase* , 2009, The Journal of Biological Chemistry.

[57]  Jeffrey Skolnick,et al.  Performance of the Pro‐sp3‐TASSER server in CASP8 , 2009, Proteins.