Predicting enzymatic function from global binding site descriptors

Due to the rising number of solved protein structures, computer‐based techniques for automatic protein functional annotation and classification into families are of high scientific interest. DoGSiteScorer automatically calculates global descriptors for self‐predicted pockets based on the 3D structure of a protein. Protein function predictors on three levels with increasing granularity are built by use of a support vector machine (SVM), based on descriptors of 26632 pockets from enzymes with known structure and enzyme classification. The SVM models represent a generalization of the available descriptor space for each enzyme class, subclass, and substrate‐specific sub‐subclass. Cross‐validation studies show accuracies of 68.2% for predicting the correct main class and accuracies between 62.8% and 80.9% for the six subclasses. Substrate‐specific recall rates for a kinase subset are 53.8%. Furthermore, application studies show the ability of the method for predicting the function of unknown proteins and gaining valuable information for the function prediction field. Proteins 2013. © 2012 Wiley Periodicals, Inc.

[1]  Heidi J. Imker,et al.  The Enzyme Function Initiative. , 2011, Biochemistry.

[2]  Matthias Rarey,et al.  Analyzing the Topology of Active Sites: On the Prediction of Pockets and Subpockets , 2010, J. Chem. Inf. Model..

[3]  K. Sjölander,et al.  FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function , 2007, BMC evolutionary biology.

[4]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[5]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[6]  Paul Walsh,et al.  An overview of in silico protein function prediction , 2010, Archives of Microbiology.

[7]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[8]  Daisuke Kihara,et al.  ESG: extended similarity group method for automated protein function prediction , 2008, Bioinform..

[9]  Michael I. Jordan,et al.  Protein Molecular Function Prediction by Bayesian Phylogenomics , 2005, PLoS Comput. Biol..

[10]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[11]  Daisuke Kihara,et al.  Structure- and sequence-based function prediction for non-homologous proteins , 2012, Journal of Structural and Functional Genomics.

[12]  Günther Zehetner,et al.  OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms , 2003, Nucleic Acids Res..

[13]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[14]  Jacquelyn S Fetrow Active site profiling to identify protein functional sites in sequences and structures using the Deacon Active Site Profiler (DASP). , 2006, Current protocols in bioinformatics.

[15]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  S. Kim,et al.  Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[19]  Cathy H. Wu,et al.  Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties , 2006, BMC Bioinformatics.

[20]  Johannes C. Hermann,et al.  Structure-based activity prediction for an enzyme of unknown function , 2007, Nature.

[21]  Y.Z. Chen,et al.  Enzyme family classification by support vector machines , 2004, Proteins.

[22]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[23]  Conrad C. Huang,et al.  Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. , 2006, Biochemistry.

[24]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[25]  Nicola D. Gold,et al.  A Searchable Database for Comparing Protein-Ligand Binding Sites for the Analysis of Structure-Function Relationships , 2006, J. Chem. Inf. Model..

[26]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. , 1998, Journal of molecular biology.

[27]  K. Kinoshita,et al.  Identification of protein functions from a molecular surface database, eF-site , 2004, Journal of Structural and Functional Genomics.

[28]  S. Izrailev,et al.  Enzyme classification by ligand binding , 2004, Proteins.

[29]  A. Baucom,et al.  Predicting protein function from structure: unique structural features of proteases. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Dusanka Janezic,et al.  ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins , 2012, Nucleic Acids Res..

[31]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[32]  Mary Jo Ondrechen,et al.  Functional classification of protein 3D structures from predicted local interaction sites. , 2010, Journal of bioinformatics and computational biology.

[33]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[35]  M. Ondrechen,et al.  Protein structure to function: insights from computation , 2004, Cellular and Molecular Life Sciences CMLS.

[36]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[37]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[38]  A. Godzik,et al.  Computational protein function prediction: Are we making progress? , 2007, Cellular and Molecular Life Sciences.

[39]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[40]  Michael J. E. Sternberg,et al.  ConFunc - functional annotation in the twilight zone , 2008, Bioinform..

[41]  P. Dobson,et al.  Predicting enzyme class from protein structure without alignments. , 2005, Journal of molecular biology.

[42]  Jian Peng,et al.  Alignment of distantly related protein structures: algorithm, bound and implications to homology modeling , 2011, Bioinform..

[43]  G. Klebe,et al.  From the Similarity Analysis of Protein Cavities to the Functional Classification of Protein Families Using Cavbase , 2006, Journal of Molecular Biology.

[44]  Nicholas J. Davidson,et al.  Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[45]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[46]  Michael Y. Galperin,et al.  Analogous enzymes: independent inventions in enzyme evolution. , 1998, Genome research.

[47]  Vito Porcelli,et al.  Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening. , 2010, Biochimica et biophysica acta.

[48]  Peter F. Stadler,et al.  Temperature-Dependent Structural Variability of RNAs: spliced Leader RNAs and their Evolutionary History , 2010, J. Bioinform. Comput. Biol..

[49]  Daisuke Kihara,et al.  Function Prediction of uncharacterized proteins , 2007, J. Bioinform. Comput. Biol..

[50]  Daniel Kuhn,et al.  Combining Global and Local Measures for Structure-Based Druggability Predictions , 2012, J. Chem. Inf. Model..

[51]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[52]  Carl J. Schmidt,et al.  GoFigure: Automated Gene OntologyTM annotation , 2003, Bioinform..

[53]  M. Milik,et al.  Common Structural Cliques: a tool for protein structure and function analysis. , 2003, Protein engineering.

[54]  E. Kellenberger,et al.  A simple and fuzzy method to align and compare druggable ligand‐binding sites , 2008, Proteins.

[55]  Michal Brylinski,et al.  Comparison of structure‐based and threading‐based approaches to protein functional annotation , 2010, Proteins.

[56]  Shoshana D. Brown,et al.  A gold standard set of mechanistically diverse enzyme superfamilies , 2006, Genome Biology.

[57]  H. Gohlke,et al.  Structure-based computational analysis of protein binding sites for function and druggability prediction. , 2012, Journal of biotechnology.

[58]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[59]  Maria Kontoyianni,et al.  Functional Prediction of Binding Pockets , 2012, J. Chem. Inf. Model..

[60]  Nathanael Weill,et al.  Alignment-Free Ultra-High-Throughput Comparison of Druggable Protein-Ligand Binding Sites , 2010, J. Chem. Inf. Model..

[61]  Jan Griebsch,et al.  PAST: fast structure-based searching in the PDB , 2006, Nucleic Acids Res..

[62]  Predrag Radivojac,et al.  Computational methods for identification of functional residues in protein structures. , 2011, Current protein & peptide science.

[63]  Jean-Michel Claverie,et al.  Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis , 2005, BMC Bioinformatics.

[64]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[65]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[66]  Stéphanie Pérot,et al.  Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. , 2010, Drug discovery today.

[67]  Shmuel Pietrokovski,et al.  Increased coverage of protein families with the Blocks Database servers , 2000, Nucleic Acids Res..

[68]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[69]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[70]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[71]  Didier Rognan,et al.  How to Measure the Similarity Between Protein Ligand-Binding Sites? , 2008 .

[72]  Kengo Kinoshita,et al.  eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape , 2007, Nucleic Acids Res..

[73]  A. Elofsson,et al.  Structure is three to ten times more conserved than sequence—A study of structural response in protein cores , 2009, Proteins.

[74]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[75]  Michael Y. Galperin,et al.  Beyond complete genomes: from sequence to structure and function. , 1998, Current opinion in structural biology.

[76]  Gerard J. Kleywegt,et al.  A chemogenomics view on protein-ligand spaces , 2009, BMC Bioinformatics.

[77]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[78]  Antje Chang,et al.  BRENDA, the enzyme information system in 2011 , 2010, Nucleic Acids Res..

[79]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[80]  Wagner Meira,et al.  Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns , 2011, BMC Genomics.

[81]  Dusanka Janezic,et al.  ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment , 2010, Bioinform..

[82]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[83]  Yoav Freund,et al.  ResBoost: characterizing and predicting catalytic residues in enzymes , 2009, BMC Bioinformatics.

[84]  Daisuke Kihara,et al.  New paradigm in protein function prediction for large scale omics analysis. , 2008, Molecular bioSystems.

[85]  J. Warwicker,et al.  Sequence and structural features of enzymes and their active sites by EC class. , 2009, Journal of molecular biology.

[86]  C. Orengo,et al.  Protein function prediction--the power of multiplicity. , 2009, Trends in biotechnology.

[87]  Patricia C. Babbitt,et al.  Quantitative Comparison of Catalytic Mechanisms and Overall Reactions in Convergently Evolved Enzymes: Implications for Classification of Enzyme Function , 2010, PLoS Comput. Biol..

[88]  Andrzej Joachimiak,et al.  Protein Functional Surfaces: Global Shape Matching and Local Spatial Alignments of Ligand Binding Sites , 2008, BMC Structural Biology.