Supporting biomedical ontology evolution by identifying outdated concepts and the required type of change

The consistent evolution of ontologies is a major challenge for systems using semantically enriched data, for example, for annotating, indexing, or reasoning. The biomedical domain is a typical example where ontologies, expressed with different formalisms, have been used for a long time and whose dynamic nature requires the regular revision of underlying systems. However, the automatic identification of outdated concepts and proposition of revision actions to update them are still open research questions. Solutions to these problems are of great interest to organizations that manage huge and dynamic ontologies. In this paper, we present an approach for (i) identifying the concepts of an ontology that require revision and (ii) suggesting the type of revision. Our analysis is based on three aspects: structural information encoded in the ontology, relational information gained from external source of knowledge (i.e., PubMed and UMLS) and temporal information derived from the history of the ontology. Our approach aims to evaluate different methods and parameters used by supervised learning classifiers to identify both the set of concepts that need revision, and the type of revision. We applied our approach to four well-known biomedical ontologies/terminologies (ICD-9-CM, MeSH, NCIt and SNOMED CT) and compared our results to similar approaches. Our model shows accuracy ranging from 68% (for SNOMED CT) to 91% (for MeSH), and an average of 71% when considering all datasets together.

[1]  Michel C. A. Klein,et al.  Ontology Versioning and Change Detection on the Web , 2002, EKAW.

[2]  Enrico Motta,et al.  Exploring the Semantic Web as Background Knowledge for Ontology Matching , 2008, J. Data Semant..

[3]  R. V. Krejcie,et al.  Determining Sample Size for Research Activities , 1970 .

[4]  Marcos Da Silveira,et al.  Combining rules, background knowledge and change patterns to maintain semantic annotations , 2017, AMIA.

[5]  Erhard Rahm,et al.  OnEX: Exploring changes in life science ontologies , 2009, BMC Bioinformatics.

[6]  O Bodenreider,et al.  Biomedical ontologies in action: role in knowledge management, data integration and decision support. , 2008, Yearbook of medical informatics.

[7]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[8]  Francesco Osborne,et al.  Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors , 2016, EKAW.

[9]  Franz Baader,et al.  SNOMED CT's Problem List: Ontologists' and Logicians' Therapy Suggestions , 2007, MedInfo.

[10]  Erhard Rahm,et al.  Leveraging the Impact of Ontology Evolution on Semantic Annotations , 2016, EKAW.

[11]  Bijan Parsia,et al.  Categorising logical differences between OWL ontologies , 2011, CIKM '11.

[12]  E. Perakslis,et al.  Effective knowledge management in translational medicine , 2010, Journal of Translational Medicine.

[13]  Júlio Cesar dos Reis,et al.  Understanding semantic mapping evolution by observing changes in biomedical ontologies , 2014, J. Biomed. Informatics.

[14]  Boris Motik,et al.  User-Driven Ontology Evolution Management , 2002, EKAW.

[15]  Boris Konev,et al.  Logical Difference Computation with CEX2.5 , 2012, IJCAR.

[16]  Alan L. Rector,et al.  Binding Ontologies & Coding Systems to Electronic Health Records and Messages , 2006, KR-MED.

[17]  Michel C. A. Klein,et al.  Concept drift and how to identify it , 2011, J. Web Semant..

[18]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[19]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[20]  R. Doyle The American terrorist. , 2001, Scientific American.

[21]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[22]  Erhard Rahm,et al.  COnto-Diff: generation of complex evolution mappings for life science ontologies , 2013, J. Biomed. Informatics.

[23]  Erhard Rahm,et al.  Evolution of biomedical ontologies and mappings: Overview of recent approaches , 2016, Computational and structural biotechnology journal.

[24]  Robert Stevens,et al.  Measuring the level of activity in community built bio-ontologies , 2013, J. Biomed. Informatics.

[25]  Ljiljana Stojanovic,et al.  Methods and tools for ontology evolution , 2004 .

[26]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[27]  Rinke Hoekstra,et al.  Detecting and Reporting Extensional Concept Drift in Statistical Linked Data , 2013, SemStats@ISWC.

[28]  Iraklis Varlamis,et al.  Temporal Classifiers for Predicting the Expansion of Medical Subject Headings , 2013, CICLing.

[29]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[30]  Catia Pesquita,et al.  Predicting the Extension of Biomedical Ontologies , 2012, PLoS Comput. Biol..

[31]  Stefan Schlobach,et al.  CEDAR: The Dutch historical censuses as Linked Open Data , 2016, Semantic Web.

[32]  Jens Lehmann,et al.  Introduction to Linked Data and Its Lifecycle on the Web , 2013, Reasoning Web.