Relating Complexity and Error Rates of Ontology Concepts

OBJECTIVES Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts. METHODS A measure of lateral complexity defined as the number of exhibited role types is used to distinguish between more complex and simpler concepts. Using a framework called an area taxonomy, a kind of abstraction network that summarizes the structural organization of an ontology, concepts are divided into two groups along these lines. Various concepts from each group are then subjected to a two-phase QA analysis to uncover and verify errors and inconsistencies in their modeling. A hierarchy of the National Cancer Institute thesaurus (NCIt) is used as our test-bed. A hypothesis pertaining to the expected error rates of the complex and simple concepts is tested. RESULTS Our study was done on the NCIt's Biological Process hierarchy. Various errors, including missing roles, incorrect role targets, and incorrectly assigned roles, were discovered and verified in the two phases of our QA analysis. The overall findings confirmed our hypothesis by showing a statistically significant difference between the amounts of errors exhibited by more laterally complex concepts vis-à-vis simpler concepts. CONCLUSIONS QA is an essential part of any ontology's maintenance regimen. In this paper, we reported on the results of a QA study targeting two groups of ontology concepts distinguished by their level of complexity, defined in terms of the number of exhibited role types. The study was carried out on a major component of an important ontology, the NCIt. The findings suggest that more complex concepts tend to have a higher error rate than simpler concepts. These findings can be utilized to guide ongoing efforts in ontology QA.

[1]  Hua Min,et al.  Automated comparative auditing of NCIT genomic roles using NCBI , 2008, J. Biomed. Informatics.

[2]  Yue Wang,et al.  Research Paper: Auditing as Part of the Terminology Design Life Cycle , 2006, J. Am. Medical Informatics Assoc..

[3]  Alexa T. McCray,et al.  An Upper-Level Ontology for the Biomedical Domain , 2003, Comparative and functional genomics.

[4]  Stefan Schulz,et al.  The Pitfalls of Thesaurus Ontologization - the Case of the NCI Thesaurus. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[5]  George Hripcsak,et al.  A tribal abstraction network for SNOMED CT target hierarchies without attribute relationships , 2015, J. Am. Medical Informatics Assoc..

[6]  Gilberto Fragoso,et al.  The NCI Thesaurus quality assurance life cycle , 2009, J. Biomed. Informatics.

[7]  K. Bretonnel Cohen,et al.  Ontology quality assurance through analysis of term transformations , 2009, Bioinform..

[8]  James Geller,et al.  Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies , 2015, J. Am. Medical Informatics Assoc..

[9]  Olivier Bodenreider,et al.  Using SPARQL to Test for Lattices: Application to Quality Assurance in Biomedical Ontologies , 2010, International Semantic Web Conference.

[10]  Olivier Bodenreider,et al.  Approaches to Eliminating Cycles in the UMLS Metathesaurus: Naïve vs. Formal , 2005, AMIA.

[11]  Ronald Cornet,et al.  Intra-axiom redundancies in SNOMED CT , 2015, Artif. Intell. Medicine.

[12]  Daniel L. Rubin,et al.  Ontology-based Annotation and Query of Tissue Microarray Data , 2006, AMIA.

[13]  Mark S. Tuttle,et al.  NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results , 2004, MedInfo.

[14]  M Halper,et al.  Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies. , 2016, Methods of information in medicine.

[15]  Yue Wang,et al.  Abstraction of complex concepts with a refined partial-area taxonomy of SNOMED , 2012, J. Biomed. Informatics.

[16]  Shao-Chi Huang,et al.  The Integrated Proactive Surveillance System for Prostate Cancer , 2012, The open medical informatics journal.

[17]  Yue Wang,et al.  Analysis of Error Concentrations in SNOMED , 2007, AMIA.

[18]  Werner Ceusters,et al.  Applying evolutionary terminology auditing to the Gene Ontology , 2009, J. Biomed. Informatics.

[19]  J E Rogers,et al.  Quality Assurance of Medical Ontologies , 2006, Methods of Information in Medicine.

[20]  Yue Wang,et al.  Structural methodologies for auditing SNOMED , 2007, J. Biomed. Informatics.

[21]  Yehoshua Perl,et al.  Abstraction networks for terminologies: Supporting management of "big knowledge" , 2015, Artif. Intell. Medicine.

[22]  Annabel Bourde,et al.  Ontology Driven Decision Support Systems for Medical Diagnosis - An interactive form for consultation in patients with plasma cell disease , 2012, MIE.

[23]  Guangming Xing,et al.  FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies , 2016, BioData Mining.

[24]  Ankur Agrawal,et al.  Contrasting lexical similarity and formal definitions in SNOMED CT: Consistency and implications , 2014, J. Biomed. Informatics.

[25]  Guo-Qiang Zhang Large-Scale, Exhaustive Lattice-Based Structural Auditing of SNOMED CT , 2010, KSEM.

[26]  Harold R. Solbrig,et al.  Using the UMLS Semantic Network to Validate NCI Thesaurus Structure and Analyze its Alignment with the OBO Relations Ontology , 2007, AMIA.

[27]  James Geller,et al.  A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies , 2016, J. Biomed. Informatics.

[28]  Rajat K. De,et al.  Interval based fuzzy systems for identification of important genes from microarray gene expression data: Application to carcinogenic development , 2009, J. Biomed. Informatics.

[29]  A T McCray,et al.  The Representation of Meaning in the UMLS , 1995, Methods of Information in Medicine.

[30]  W Ceusters,et al.  A Terminological and Ontological Analysis of the NCI Thesaurus , 2005, Methods of Information in Medicine.

[31]  L Charles Bailey,et al.  Building a Common Pediatric Research Terminology for Accelerating Child Health Research , 2014, Pediatrics.

[32]  Chien-Yeh Hsu,et al.  The TCR cancer registry repository for annotating Cancer Data , 2011, 2011 2nd IEEE International Conference on Emergency Management and Management Sciences.

[33]  Licong Cui COHeRE: Cross-Ontology Hierarchical Relation Examination for Ontology Quality Assurance , 2015, AMIA.

[34]  James Geller,et al.  The Neighborhood Auditing Tool: A hybrid interface for auditing the UMLS , 2009, J. Biomed. Informatics.

[35]  Fleur Mougin,et al.  Auditing the multiply-related concepts within the UMLS. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[36]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[37]  Barry Smith,et al.  BMC Bioinformatics Methodology article , 2005 .

[38]  James J. Cimino,et al.  Research Paper: Auditing the Unified Medical Language System with Semantic Methods , 1998, J. Am. Medical Informatics Assoc..

[39]  Olivier Bodenreider,et al.  Auditing the NCI Thesaurus with Semantic Web Technologies , 2008, AMIA.

[40]  Olivier Bodenreider,et al.  Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names , 2016, ICBO/BioCreative.

[41]  Olivier Bodenreider,et al.  Using the Abstraction Network in Complement to Description Logics for Quality Assurance in Biomedical Terminologies - A Case Study in SNOMED CT , 2010, MedInfo.

[42]  James Geller,et al.  Special Issue on Auditing of Terminologies , 2009, J. Biomed. Informatics.

[43]  Lingyun Luo,et al.  Dissecting the Ambiguity of FMA Concept Names Using Taxonomy and Partonomy Structural Information , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[44]  Lingyun Luo,et al.  An analysis of FMA using structural self-bisimilarity , 2013, J. Biomed. Informatics.

[45]  Sunghwan Sohn,et al.  Drug Normalization for Cancer Therapeutic and Druggable Genome Target Discovery , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[46]  Christopher G. Chute,et al.  Viewpoint Paper: Auditing the Semantic Completeness of SNOMED CT Using Formal Concept Analysis , 2009, J. Am. Medical Informatics Assoc..

[47]  Yue Wang,et al.  Auditing complex concepts of SNOMED using a refined hierarchical abstraction network , 2012, J. Biomed. Informatics.

[48]  Chunhua Weng,et al.  A review of auditing methods applied to the content of controlled biomedical terminologies , 2009, J. Biomed. Informatics.

[49]  Diana Kaufman-Rivi,et al.  FDA adverse Event Problem Codes: standardizing the classification of device and patient problems associated with medical device use. , 2010, Biomedical instrumentation & technology.

[50]  Fleur Mougin Identifying Redundant and Missing Relations in the Gene Ontology , 2015, MIE.

[51]  Larry Wright,et al.  Overview and Utilization of the NCI Thesaurus , 2004, Comparative and functional genomics.