Outlier concepts auditing methodology for a large family of biomedical ontologies

Summarization networks are compact summaries of ontologies. The “Big Picture” view offered by summarization networks enables to identify sets of concepts that are more likely to have errors than control concepts. For ontologies that have outgoing lateral relationships, we have developed the "partial-area taxonomy" summarization network. Prior research has identified one kind of outlier concepts, concepts of small partials-areas within partial-area taxonomies. Previously we have shown that the small partial-area technique works successfully for four ontologies (or their hierarchies). To improve the Quality Assurance (QA) scalability, a family-based QA framework, where one QA technique is potentially applicable to a whole family of ontologies with similar structural features, was developed. The 373 ontologies hosted at the NCBO BioPortal in 2015 were classified into a collection of families based on structural features. A meta-ontology represents this family collection, including one family of ontologies having outgoing lateral relationships. The process of updating the current meta-ontology is described. To conclude that one QA technique is applicable for at least half of the members for a family F, this technique should be demonstrated as successful for six out of six ontologies in F. We describe a hypothesis setting the condition required for a technique to be successful for a given ontology. The process of a study to demonstrate such success is described. This paper intends to prove the scalability of the small partial-area technique. We first updated the meta-ontology classifying 566 BioPortal ontologies. There were 371 ontologies in the family with outgoing lateral relationships. We demonstrated the success of the small partial-area technique for two ontology hierarchies which belong to this family, SNOMED CT’s Specimen hierarchy and NCIt’s Gene hierarchy. Together with the four previous ontologies from the same family, we fulfilled the “six out of six” condition required to show the scalability for the whole family. We have shown that the small partial-area technique can be potentially successful for the family of ontologies with outgoing lateral relationships in BioPortal, thus improve the scalability of this QA technique.

[1]  Gilberto Fragoso,et al.  The NCI Thesaurus quality assurance life cycle , 2009, J. Biomed. Informatics.

[2]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[3]  Christopher G. Chute,et al.  The National Center for Biomedical Ontology , 2012, J. Am. Medical Informatics Assoc..

[4]  Guangming Xing,et al.  An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies , 2018, J. Biomed. Informatics.

[5]  Yue Wang,et al.  Abstraction of complex concepts with a refined partial-area taxonomy of SNOMED , 2012, J. Biomed. Informatics.

[6]  Jean-Philippe F Gourdine,et al.  Representing glycophenotypes: semantic unification of glycobiology resources for disease discovery , 2019, Database J. Biol. Databases Curation.

[7]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[8]  Licong Cui,et al.  Identifying Similar Non-Lattice Subgraphs in Gene Ontology based on Structural Isomorphism and Semantic Similarity of Concept Labels , 2018, AMIA.

[9]  Mark S. Tuttle,et al.  NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer Research Results , 2004, MedInfo.

[10]  James Geller,et al.  Complex overlapping concepts: An effective auditing methodology for families of similarly structured BioPortal ontologies , 2018, J. Biomed. Informatics.

[11]  Yue Wang,et al.  Structural methodologies for auditing SNOMED , 2007, J. Biomed. Informatics.

[12]  Hans-Ulrich Prokosch,et al.  Ontology-Based Data Integration between Clinical and Research Systems , 2015, PloS one.

[13]  George Hripcsak,et al.  Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. , 2018, American journal of human genetics.

[14]  Hao Liu,et al.  Correcting Ontology Errors Simplifies Visual Complexity , 2017, MedInfo.

[15]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[16]  George Hripcsak,et al.  A tribal abstraction network for SNOMED CT target hierarchies without attribute relationships , 2015, J. Am. Medical Informatics Assoc..

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[18]  Yi Guo,et al.  An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival , 2018, BMC Medical Informatics and Decision Making.

[19]  Ling Chen,et al.  A Quality Assurance Methodology for ChEBI Ontology Focusing on Uncommonly Modeled Concepts , 2018, ICBO.

[20]  Paul N. Schofield,et al.  The role of ontologies in biological and biomedical research: a functional perspective , 2015, Briefings Bioinform..

[21]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[22]  James Geller,et al.  Quality assurance of the gene ontology using abstraction networks , 2016, J. Bioinform. Comput. Biol..

[23]  Daniel J. Vreeman,et al.  Semantic Integration of Clinical Laboratory Tests from Electronic Health Records for Deep Phenotyping and Biomarker Discovery , 2019 .

[24]  Olivier Bodenreider,et al.  Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs , 2018, J. Biomed. Informatics.

[25]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[26]  James Geller,et al.  A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies , 2016, J. Biomed. Informatics.

[27]  Olivier Bodenreider,et al.  Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT , 2017, J. Am. Medical Informatics Assoc..

[28]  James Geller,et al.  Special Issue on Auditing of Terminologies , 2009, J. Biomed. Informatics.

[29]  George Hripcsak,et al.  Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies , 2016, J. Biomed. Informatics.

[30]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[31]  James Geller,et al.  Quality Assurance of Concept Roles in the National Cancer Institute thesaurus , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[32]  Hua Min,et al.  Relating Complexity and Error Rates of Ontology Concepts , 2017, Methods of Information in Medicine.

[33]  James Geller,et al.  A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal , 2013, AMIA.

[34]  Yue Wang,et al.  Analysis of Error Concentrations in SNOMED , 2007, AMIA.

[35]  P. Missier,et al.  Increasing phenotypic annotation improves the diagnostic rate of exome sequencing in a rare neuromuscular disorder , 2019, Human mutation.

[36]  Yue Wang,et al.  Research Paper: Auditing as Part of the Terminology Design Life Cycle , 2006, J. Am. Medical Informatics Assoc..

[37]  Yehoshua Perl,et al.  Quality assurance of complex ChEBI concepts based on number of relationship types , 2019, Appl. Ontology.

[38]  James Geller,et al.  Quality assurance of biomedical terminologies and ontologies , 2018, J. Biomed. Informatics.

[40]  Yehoshua Perl,et al.  Abstraction networks for terminologies: Supporting management of "big knowledge" , 2015, Artif. Intell. Medicine.

[41]  James Geller,et al.  Scalability of Abstraction-Network-Based Quality Assurance to Large SNOMED Hierarchies , 2013, AMIA.

[42]  Yehoshua Perl,et al.  Taxonomy-Based Approaches to Quality Assurance of Ontologies , 2017, Journal of healthcare engineering.

[43]  Jane Millar,et al.  The Need for a Global Language - SNOMED CT Introduction , 2016, Nursing Informatics.

[44]  James Geller,et al.  Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies , 2015, J. Am. Medical Informatics Assoc..

[45]  James Geller,et al.  Discovering additional complex NCIt gene concepts with high error rate , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[46]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[47]  Yue Wang,et al.  Auditing Complex Concepts in Overlapping Subsets of SNOMED , 2008, AMIA.

[48]  Yue Wang,et al.  Auditing complex concepts of SNOMED using a refined hierarchical abstraction network , 2012, J. Biomed. Informatics.

[49]  Chunhua Weng,et al.  A review of auditing methods applied to the content of controlled biomedical terminologies , 2009, J. Biomed. Informatics.

[50]  James Geller,et al.  Auditing National Cancer Institute thesaurus neoplasm concepts in groups of high error concentration , 2017, Appl. Ontology.

[51]  Peter Woollard,et al.  Ontology mapping for semantically enabled applications. , 2019, Drug discovery today.