Evaluating Functional Annotations of Enzymes Using the Gene Ontology

The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25–29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521–530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

[1]  Andrew G. McDonald,et al.  ExplorEnz: the primary source of the IUBMB enzyme list , 2008, Nucleic Acids Res..

[2]  Patricia C. Babbitt,et al.  Pythoscape: a framework for generation of large protein similarity networks , 2012, Bioinform..

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  P Bork,et al.  Exploitation of gene context. , 2000, Current opinion in structural biology.

[5]  Thomas E. Ferrin,et al.  Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies , 2009, PloS one.

[6]  Paolo Fontana,et al.  Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms , 2012, BMC Bioinformatics.

[7]  Yanqing Ji,et al.  Modern Computational Techniques for the HMMER Sequence Analysis , 2013, ISRN bioinformatics.

[8]  Rolf Apweiler,et al.  IntEnz, the integrated relational enzyme database , 2004, Nucleic Acids Res..

[9]  Heidi J. Imker,et al.  Prediction and assignment of function for a divergent N-succinyl amino acid racemase. , 2007, Nature chemical biology.

[10]  Tipton Kf,et al.  Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). Enzyme nomenclature. Recommendations 1992. Supplement: corrections and additions. , 1994 .

[11]  Nomenclature committee of the international union of biochemistry and molecular biology (NC-IUBMB), Enzyme Supplement 5 (1999). , 1999, European journal of biochemistry.

[12]  J. Warwicker,et al.  Sequence and structural features of enzymes and their active sites by EC class. , 2009, Journal of molecular biology.

[13]  P. Babbitt Definitions of enzyme function for the structural genomics era. , 2003, Current opinion in chemical biology.

[14]  Michael A. Hicks,et al.  The Structure–Function Linkage Database , 2013, Nucleic Acids Res..

[15]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[16]  Nicholas Furnham,et al.  Complementary Sources of Protein Functional Information: The Far Side of GO. , 2017, Methods in molecular biology.

[17]  S. Brenner Errors in genome annotation. , 1999, Trends in genetics : TIG.

[18]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[19]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[20]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[21]  Paul D Thomas,et al.  The Gene Ontology and the Meaning of Biological Function. , 2017, Methods in molecular biology.

[22]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[23]  Richard N. Armstrong,et al.  Large-Scale Determination of Sequence, Structure, and Function Relationships in Cytosolic Glutathione Transferases across the Biosphere , 2014, PLoS biology.

[24]  P. Babbitt,et al.  Divergent Evolution in Enolase Superfamily: Strategies for Assigning Functions* , 2011, The Journal of Biological Chemistry.

[25]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[26]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[27]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[28]  Ian Sillitoe,et al.  Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis , 2011, Nucleic Acids Res..

[29]  J A Blake,et al.  Program description: Strategies for biological annotation of mammalian systems: implementing gene ontologies in mouse genome informatics. , 2001, Genomics.

[30]  Haipeng Liu,et al.  MoonProt: a database for proteins that are known to moonlight , 2013, Nucleic Acids Res..

[31]  C. Webber,et al.  Functional Enrichment Analysis with Structural Variants: Pitfalls and Strategies , 2011, Cytogenetic and Genome Research.

[32]  L. Sampaleanu,et al.  Mutational analysis of duck delta 2 crystallin and the structure of an inactive mutant with bound substrate provide insight into the enzymatic mechanism of argininosuccinate lyase. , 2002, The Journal of biological chemistry.

[33]  Heidi J. Imker,et al.  Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. , 2015, Biochimica et biophysica acta.

[34]  Sidahmed Benabderrahmane,et al.  IntelliGO: a new vector-based semantic similarity measure including annotation origin , 2010, BMC Bioinformatics.

[35]  Dan S. Tawfik,et al.  Mechanisms of Protein Sequence Divergence and Incompatibility , 2013, PLoS genetics.

[36]  The Uniprot Consortium UniProt: the universal protein knowledgebase , 2018, Nucleic acids research.

[37]  Rodrigo Lopez,et al.  The EMBL-EBI bioinformatics web and programmatic tools framework , 2015, Nucleic Acids Res..

[38]  Christine A. Orengo,et al.  Protein function prediction using domain families , 2013, BMC Bioinformatics.

[39]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[40]  Predrag Radivojac,et al.  Community-Wide Evaluation of Computational Function Prediction. , 2016, Methods in molecular biology.

[41]  Eyal Akiva,et al.  [FeFe]-hydrogenase maturation: insights into the role HydE plays in dithiomethylamine biosynthesis. , 2015, Biochemistry.

[42]  P. Dobson,et al.  Predicting enzyme class from protein structure without alignments. , 2005, Journal of molecular biology.

[43]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[44]  Patricia C. Babbitt,et al.  New Insights about Enzyme Evolution from Large Scale Studies of Sequence and Structure Relationships* , 2014, The Journal of Biological Chemistry.

[45]  James C. Hu,et al.  Primer on the Gene Ontology. , 2016, Methods in molecular biology.

[46]  Christophe Dessimoz,et al.  The Gene Ontology Handbook , 2017, Methods in Molecular Biology.

[47]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[48]  Xiaomei Wu,et al.  Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method , 2013, PloS one.

[49]  Marcus C Chibucos,et al.  The Evidence and Conclusion Ontology (ECO): Supporting GO Annotations. , 2017, Methods in molecular biology.

[50]  Patricia C Babbitt,et al.  Evolution of enzymatic activities in the enolase superfamily: stereochemically distinct mechanisms in two families of cis,cis-muconate lactonizing enzymes. , 2009, Biochemistry.

[51]  David T Jones,et al.  Computational Methods for Annotation Transfers from Sequence. , 2016, Methods in molecular biology.

[52]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[53]  Catia Pesquita,et al.  Semantic Similarity in the Gene Ontology. , 2017, Methods in molecular biology.

[54]  Christophe Dessimoz,et al.  Quality of Computationally Inferred Gene Ontology Annotations , 2012, PLoS Comput. Biol..

[55]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[56]  Stephen K. Burley,et al.  Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies , 2009, Journal of Structural and Functional Genomics.

[57]  P. Babbitt,et al.  Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. , 2001, Annual review of biochemistry.

[58]  Marcus C. Chibucos,et al.  The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations , 2015, Database J. Biol. Databases Curation.

[59]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[60]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[61]  Song Zhang,et al.  A Bayesian extension of the hypergeometric test for functional enrichment analysis , 2014, Biometrics.

[62]  P. Lynne Howell,et al.  Mutational Analysis of Duck δ2 Crystallin and the Structure of an Inactive Mutant with Bound Substrate Provide Insight into the Enzymatic Mechanism of Argininosuccinate Lyase* , 2002, The Journal of Biological Chemistry.

[63]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[64]  Judith A. Blake,et al.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report , 2012, PLoS Comput. Biol..