PIERO ontology for analysis of biochemical transformations: Effective implementation of reaction information in the IUBMB enzyme list

Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.

[1]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[2]  Marco Brandizi,et al.  Updates to BioSamples database at European Bioinformatics Institute , 2014, Nucleic Acids Res..

[3]  Kazuki Saito,et al.  Metabolomics for unknown plant metabolites , 2013, Analytical and Bioanalytical Chemistry.

[4]  Hongyan Wu,et al.  BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data , 2014, J. Biomed. Semant..

[5]  R. A. Y. Jones,et al.  Nomenclature for organic chemical transformations (Recommendations 1988) , 1989 .

[6]  Yasuo Tabei,et al.  KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics , 2013, BMC Systems Biology.

[7]  Andrew G McDonald,et al.  Fifty‐five years of enzyme classification: advances and difficulties , 2014, The FEBS journal.

[8]  Susumu Goto,et al.  Modular Architecture of Metabolic Pathways Revealed by Conserved Sequences of Reactions , 2013, J. Chem. Inf. Model..

[9]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[10]  Atsuko Yamaguchi,et al.  TogoTable: cross-database annotation system using the Resource Description Framework (RDF) data model , 2014, Nucleic Acids Res..

[11]  Masaaki Kotera,et al.  Functional Group and Substructure Searching as a Tool in Metabolomics , 2008, PloS one.

[12]  Alberto Anguita,et al.  NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases , 2013, BioMed research international.

[13]  M. Kanehisa,et al.  Predictive genomic and metabolomic analysis for the standardization of enzyme data , 2014 .

[14]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[15]  Vladimir Shulaev,et al.  Functional genomics, challenges and perspectives for the future. , 2013, Physiologia plantarum.

[16]  K. Bretonnel Cohen,et al.  BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains , 2014, Journal of Biomedical Semantics.

[17]  A. R. Y. Jones,et al.  Nomenclature for Organic Chemical Transformations , 1990 .

[18]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[19]  John N. Weinstein,et al.  Exposing the cancer genome atlas as a SPARQL endpoint , 2010, J. Biomed. Informatics.

[20]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.