A Quality-Assurance Study of ChEBI

Ontologies are important components of many health-information systems. The Chemical Entities of Biological Interest (ChEBI) ontology has become a standard reference for chemicals appearing in biological contexts. As such, assuring the quality of its content is imperative. In fact, ChEBI has a dedicated Web page at which errors and inconsistencies in its concepts can be reported. A study of the correctness of a random sample of ChEBI concepts is carried out. The results show that quite a large number of ChEBI concepts suffer from some kind of problematic modeling. For example, we found that 15.5% of the sample concepts exhibited severe errors of commission, including incorrect hierarchical (is a) and lateral relationships. Errors of omission were also prevalent. The overall results of our quality-assurance (QA) study are presented. Suggestions for enhancing the QA processes in place for ChEBI are discussed. Keywords—ChEBI; chemical ontology; chemical concept; quality assurance; modeling error; error distribution

[1]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[2]  James Geller,et al.  Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies , 2015, J. Am. Medical Informatics Assoc..

[3]  Yue Wang,et al.  Analysis of Error Concentrations in SNOMED , 2007, AMIA.

[4]  Mark S. Tuttle,et al.  NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results , 2004, MedInfo.

[5]  A. B. Sunter,et al.  List Sequential Sampling with Equal or Unequal Probabilities without Replacement , 1977 .

[6]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[7]  James Geller,et al.  Scalability of Abstraction-Network-Based Quality Assurance to Large SNOMED Hierarchies , 2013, AMIA.

[8]  Yang Jiang,et al.  Prediction of Drugs Target Groups Based on ChEBI Ontology , 2013, BioMed research international.

[9]  Yan Chen,et al.  The readiness of SNOMED problem list concepts for meaningful use of electronic health records , 2013, Artif. Intell. Medicine.

[10]  Yue Wang,et al.  Auditing complex concepts of SNOMED using a refined hierarchical abstraction network , 2012, J. Biomed. Informatics.

[11]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[12]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[13]  João D. Ferreira,et al.  Improving chemical entity recognition through h-index based semantic similarity , 2015, Journal of Cheminformatics.

[14]  Christoph Steinbeck,et al.  Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology , 2013, BMC Genomics.

[15]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[16]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..