A Quality Assurance Methodology for ChEBI Ontology Focusing on Uncommonly Modeled Concepts

The Chemical Entities of Biological Interest (ChEBI) ontology is an important knowledge source of chemical entities in a biological context. ChEBI is large and complex, making it almost impossible to be error-free, given the scarce resources for quality assurance (QA). We present a methodology to locate concepts in ChEBI with a high probability of being erroneous. An Abstraction Network, which provides a compact summarization of an ontology, supports the methodology. By investigating a sample of ChEBI concepts, we show that uncommonly modeled concepts residing in small units of the Abstraction Network of ChEBI are statistically significantly more likely to have errors than other concepts. The finding may guide ChEBI ontology curators to focus their limited QA resources on such concepts to achieve a better QA yield. Furthermore, this study, combined with previous work, contributes to progress in showing that this methodology can be applied to a whole family of similar ontologies. Keywords—ChEBI; chemical ontology; chemical concept; quality assurance; modeling error;

[1]  Yue Wang,et al.  Structural methodologies for auditing SNOMED , 2007, J. Biomed. Informatics.

[2]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[3]  James Geller,et al.  A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal , 2013, AMIA.

[4]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[5]  Mark S. Tuttle,et al.  NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results , 2004, MedInfo.

[6]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[7]  C. Diot,et al.  Optimal Threshold Determination for Interpreting Semantic Similarity and Particularity: Application to the Comparison of Gene Sets and Metabolic Pathways Using GO and ChEBI , 2015, PloS one.

[8]  Boris Motik,et al.  HermiT: An OWL 2 Reasoner , 2014, Journal of Automated Reasoning.

[9]  Christoph Steinbeck,et al.  Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology , 2013, BMC Genomics.

[10]  George Hripcsak,et al.  Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies , 2016, J. Biomed. Informatics.

[11]  Yehoshua Perl,et al.  Abstraction networks for terminologies: Supporting management of "big knowledge" , 2015, Artif. Intell. Medicine.

[12]  Yue Wang,et al.  Abstraction of complex concepts with a refined partial-area taxonomy of SNOMED , 2012, J. Biomed. Informatics.

[13]  James Geller,et al.  Scalability of Abstraction-Network-Based Quality Assurance to Large SNOMED Hierarchies , 2013, AMIA.

[14]  Yehoshua Perl,et al.  Taxonomy-Based Approaches to Quality Assurance of Ontologies , 2017, Journal of healthcare engineering.

[15]  Yue Wang,et al.  Research Paper: Auditing as Part of the Terminology Design Life Cycle , 2006, J. Am. Medical Informatics Assoc..

[16]  Yang Jiang,et al.  Prediction of Drugs Target Groups Based on ChEBI Ontology , 2013, BioMed research international.

[17]  James Geller,et al.  A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies , 2016, J. Biomed. Informatics.

[18]  James Geller,et al.  Auditing National Cancer Institute thesaurus neoplasm concepts in groups of high error concentration , 2017, Appl. Ontology.

[19]  James Geller,et al.  A survey of SNOMED CT direct users, 2010: impressions and preferences regarding content and quality , 2011, J. Am. Medical Informatics Assoc..

[20]  Ling Zheng,et al.  A Quality-Assurance Study of ChEBI , 2016, ICBO/BioCreative.