Missing lateral relationships in top-level concepts of an ontology

Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored. The area taxonomy and the subtaxonomy are deployed to help reveal concepts that have a high likelihood of exhibiting missing relationship errors. A specific top-level grouping unit found within the area taxonomy and subtaxonomy, when deemed to be anomalous, is used as an indicator that missing relationship errors are likely to be found among certain concepts. Two hypotheses pertaining to the effectiveness of our Quality Assurance approach are studied. Our Quality Assurance methodology was applied to the Biological Process hierarchy of the National Cancer Institute thesaurus (NCIt) and SNOMED CT’s Eye/vision finding subhierarchy within its Clinical finding hierarchy. Many missing relationship errors were discovered and confirmed in our analysis. For both test-bed hierarchies, our Quality Assurance methodology yielded a statistically significantly higher number of concepts with missing relationship errors in comparison to a control sample of concepts. Two hypotheses are confirmed by these findings. Quality assurance is a critical part of an ontology’s lifecycle, and automated or semi-automated tools for supporting this process are invaluable. We introduced a Quality Assurance methodology targeted at missing relationship errors. Its successful application to the NCIt’s Biological Process hierarchy and SNOMED CT’s Eye/vision finding subhierarchy indicates that it can be a useful addition to the arsenal of tools available to ontology maintenance personnel.

[1]  George Hripcsak,et al.  Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies , 2016, J. Biomed. Informatics.

[2]  James Geller,et al.  Quality Assurance of Concept Roles in the National Cancer Institute thesaurus , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  Yue Wang,et al.  Structural methodologies for auditing SNOMED , 2007, J. Biomed. Informatics.

[4]  James Geller,et al.  Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies , 2015, J. Am. Medical Informatics Assoc..

[5]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[6]  James Geller,et al.  Scalability of Abstraction-Network-Based Quality Assurance to Large SNOMED Hierarchies , 2013, AMIA.

[7]  James Geller,et al.  A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies , 2016, J. Biomed. Informatics.

[8]  Hua Min,et al.  Relating Complexity and Error Rates of Ontology Concepts , 2017, Methods of Information in Medicine.

[9]  Mark S. Tuttle,et al.  NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results , 2004, MedInfo.

[10]  PelegMor,et al.  The Ontology of Clinical Research (OCRe) , 2014 .

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Yehoshua Perl,et al.  Tracking the Remodeling of SNOMED CT's Bacterial Infectious Diseases , 2016, AMIA.

[13]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[14]  Yue Wang,et al.  Research Paper: Auditing as Part of the Terminology Design Life Cycle , 2006, J. Am. Medical Informatics Assoc..

[15]  O Bodenreider,et al.  Biomedical ontologies in action: role in knowledge management, data integration and decision support. , 2008, Yearbook of medical informatics.

[16]  Yehoshua Perl,et al.  Quality assurance of complex ChEBI concepts based on number of relationship types , 2019, Appl. Ontology.

[17]  Mark A. Musen,et al.  The protégé project: a look back and a look forward , 2015, SIGAI.

[18]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[19]  James Geller,et al.  Quality assurance of the gene ontology using abstraction networks , 2016, J. Bioinform. Comput. Biol..

[20]  S. Fenton,et al.  SNOMED CT survey: an assessment of implementation in EMR/EHR applications. , 2008, Perspectives in health information management.

[21]  Yue Wang,et al.  Auditing Complex Concepts in Overlapping Subsets of SNOMED , 2008, AMIA.

[22]  James Geller,et al.  Quality assurance of chemical ingredient classification for the National Drug File - Reference Terminology , 2017, J. Biomed. Informatics.

[23]  Hao Liu,et al.  From SNOMED CT to Uberon: Transferability of evaluation methodology between similarly structured ontologies , 2017, Artif. Intell. Medicine.

[24]  Nikos Loutas,et al.  A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources , 2014, Semantic Web.

[25]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[26]  P. L. Hildebrand,et al.  The American Academy of Ophthalmology adopts SNOMED CT as its official clinical terminology. , 2008, Ophthalmology (Rochester, Minn.).

[27]  Yehoshua Perl,et al.  Abstraction networks for terminologies: Supporting management of "big knowledge" , 2015, Artif. Intell. Medicine.

[28]  James Geller,et al.  A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal , 2013, AMIA.

[29]  Yue Wang,et al.  Analysis of Error Concentrations in SNOMED , 2007, AMIA.

[30]  James Geller,et al.  A survey of SNOMED CT direct users, 2010: impressions and preferences regarding content and quality , 2011, J. Am. Medical Informatics Assoc..

[31]  Yue Wang,et al.  Abstraction of complex concepts with a refined partial-area taxonomy of SNOMED , 2012, J. Biomed. Informatics.

[32]  Olivier Bodenreider,et al.  Using the Abstraction Network in Complement to Description Logics for Quality Assurance in Biomedical Terminologies - A Case Study in SNOMED CT , 2010, MedInfo.

[33]  Peter Drake,et al.  Data structures and algorithms in Java , 2005 .

[34]  Mor Peleg,et al.  The Ontology of Clinical Research (OCRe): An informatics foundation for the science of clinical research , 2014, J. Biomed. Informatics.

[35]  Yue Wang,et al.  Auditing complex concepts of SNOMED using a refined hierarchical abstraction network , 2012, J. Biomed. Informatics.

[36]  Da Qi,et al.  An Ontology for Description of Drug Discovery Investigations , 2010, J. Integr. Bioinform..

[37]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[38]  Werner Nutt,et al.  Basic Description Logics , 2003, Description Logic Handbook.

[39]  Paul N. Schofield,et al.  The role of ontologies in biological and biomedical research: a functional perspective , 2015, Briefings Bioinform..

[40]  Franz Baader,et al.  Restricted Role-value-maps in a Description Logic with Existential Restrictions and Terminological Cycles , 2003, Description Logics.