ResBoost: characterizing and predicting catalytic residues in enzymes

BackgroundIdentifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed.ResultsWe propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA).ConclusionResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[3]  C. Chothia,et al.  The structure of protein-protein recognition sites. , 1990, The Journal of biological chemistry.

[4]  D. Moras,et al.  Editorial overviewProtein-nucleic acid interactions , 1993 .

[5]  K. Harata X-ray structure of monoclinic turkey egg lysozyme at 1.3 A resolution. , 1993, Acta crystallographica. Section D, Biological crystallography.

[6]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[7]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[10]  J. Wells,et al.  Structural and mutational analysis of affinity-inert contact residues at the growth hormone-receptor interface. , 1996, Biochemistry.

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[14]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[15]  R. Gregory Taylor,et al.  Models of Computation and Formal Languages , 1997 .

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[18]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[19]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[20]  G. Basarab,et al.  Catalytic mechanism of scytalone dehydratase: site-directed mutagenisis, kinetic isotope effects, and alternate substrates. , 1999, Biochemistry.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[23]  M. Ondrechen,et al.  THEMATICS: A simple computational predictor of enzyme function from structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[25]  H. Wolfson,et al.  Protein functional epitopes: hot spots, dynamics and combinatorial libraries. , 2001, Current opinion in structural biology.

[26]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[27]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[28]  D. Fong,et al.  Substrate promiscuity of an aminoglycoside antibiotic resistance enzyme via target mimicry. , 2002, The EMBO journal.

[29]  E. Rudiño-Piñera,et al.  Structural flexibility, an essential component of the allosteric activation in Escherichia coli glucosamine-6-phosphate deaminase. , 2002, Acta crystallographica. Section D, Biological crystallography.

[30]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[31]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[32]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[33]  Jie Liang,et al.  CASTp: Computed Atlas of Surface Topography of proteins , 2003, Nucleic Acids Res..

[34]  Tal Pupko,et al.  ConSurf: Identification of Functional Regions in Proteins by Surface-Mapping of Phylogenetic Information , 2003, Bioinform..

[35]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[36]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[37]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[38]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[39]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[40]  Edward J. Wegman,et al.  Data mining and data visualization , 2005 .

[41]  Tal Pupko,et al.  In silico identification of functional regions in proteins , 2005, ISMB.

[42]  M. Lawrence,et al.  The three-dimensional structure of the bifunctional 6-hydroxymethyl-7,8-dihydropterin pyrophosphokinase/dihydropteroate synthase of Saccharomyces cerevisiae. , 2005, Journal of molecular biology.

[43]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[44]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[45]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[46]  Yi Wang,et al.  Mechanism of dihydroneopterin aldolase: functional roles of the conserved active site glutamate and lysine residues. , 2006, Biochemistry.

[47]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[48]  Yue Li,et al.  Structural basis for the aldolase and epimerase activities of Staphylococcus aureus dihydroneopterin aldolase. , 2007, Journal of molecular biology.

[49]  B. Matthews From the new Editor , 2007 .

[50]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..

[51]  P. Suganthan,et al.  Identification of catalytic residues from protein structure using support vector machine with sequence and structural features. , 2008, Biochemical and biophysical research communications.

[52]  Ronald J. Williams,et al.  Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines , 2008, Protein science : a publication of the Protein Society.

[53]  Yong-Zi Chen,et al.  An improved prediction of catalytic residues in enzyme structures. , 2008, Protein engineering, design & selection : PEDS.

[54]  J. Doudna,et al.  Protein–nucleic Acid Interactions Editorial Overview , 2022 .