FEDRR: Fast, Exhaustive Detection of Redundant Hierarchical Relations in Large Biomedical Ontologies

Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). This paper intro- duces a novel and scalable approach, called FEDRR { Fast, Exhaustive Detection of Redundant Relations { for quality assurance work during ontological evolution. FEDRR combines the algorithm ideas of Dynamic Programming with Topological Sort, for exhaustive mining of all redun- dant hierarchical relations in ontological hierarchies, in O(cj Vj +jEj) time, where jVj is the number of concepts, jEj is the number of the relations, and c is a constant in practice. Using FEDRR, we performed exhaustive search of all redundant is-a relations in two of the largest onto- logical systems in biomedicine: SNOMED CT and Gene Ontology (GO). 235 and 1609 redundant is-a relations were found in the 2015-03-01 ver- sion of SNOMED CT and 2015-05-01 version of GO, respectively. Each redundant relation represents a possibly unintended defect that needs to be corrected in the ontology quality assurance process. FEDRR provides a generally applicable, eective tool for systematic detecting redundant relations in large ontological systems for quality improvement.

[1]  Erhard Rahm,et al.  GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution , 2011, J. Biomed. Semant..

[2]  Olivier Bodenreider,et al.  Using SPARQL to Test for Lattices: Application to Quality Assurance in Biomedical Ontologies , 2010, International Semantic Web Conference.

[3]  Mário J. Silva,et al.  Measuring semantic similarity between Gene Ontology terms , 2007, Data Knowl. Eng..

[4]  Guo-Qiang Zhang,et al.  MEDCIS: Multi-Modality Epilepsy Data Capture and Integration System , 2014, AMIA.

[5]  Guo-Qiang Zhang Large-Scale, Exhaustive Lattice-Based Structural Auditing of SNOMED CT , 2010, KSEM.

[6]  Guo-Qiang Zhang,et al.  A Semantic-based Approach for Exploring Consumer Health Questions Using UMLS , 2014, AMIA.

[7]  Chien-Hung Chen,et al.  Domain Ontology As Conceptual Model for Big Data Management: Application in Biomedical Informatics , 2014, ER.

[8]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[9]  Huanying Gu,et al.  Relationship auditing of the FMA ontology , 2009, J. Biomed. Informatics.

[10]  Olivier Bodenreider,et al.  MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and its application to SNOMED CT , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[11]  Olivier Bodenreider,et al.  Strength in Numbers: Exploring Redundancy in Hierarchical Relations across Biomedical Terminologies , 2003, AMIA.

[12]  Olivier Bodenreider,et al.  Mining Relation Reversals in the Evolution of SNOMED CT Using MapReduce , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[13]  Werner Ceusters Applying Evolutionary Terminology Auditing to SNOMED CT. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[14]  Yue Wang,et al.  Research Paper: Auditing as Part of the Terminology Design Life Cycle , 2006, J. Am. Medical Informatics Assoc..

[15]  Erhard Rahm,et al.  COnto-Diff: generation of complex evolution mappings for life science ontologies , 2013, J. Biomed. Informatics.

[16]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[17]  O Bodenreider,et al.  Biomedical ontologies in action: role in knowledge management, data integration and decision support. , 2008, Yearbook of medical informatics.

[18]  Christopher G. Chute,et al.  Viewpoint Paper: Auditing the Semantic Completeness of SNOMED CT Using Formal Concept Analysis , 2009, J. Am. Medical Informatics Assoc..

[19]  Fleur Mougin Identifying Redundant and Missing Relations in the Gene Ontology , 2015, MIE.

[20]  James Geller,et al.  A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal , 2013, AMIA.

[21]  Fausto Giunchiglia,et al.  S-Match: An open source framework for matching lightweight ontologies , 2012, Semantic Web.

[22]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..