A String Similarity Evaluation for Healthcare Ontologies Alignment to HL7 FHIR Resources

Current healthcare services demand the transformation of health data into a mutual way, while respecting standards for making data exchange a reality, raising the needs of interoperability. Most of the developed techniques addressing this field are dealing only with specific one-to-one scenarios of data transformation. Among these solutions, the translation of healthcare data into ontologies is considered as an answer towards interoperability. However, during ontology transformations, different terms are produced for the same concept, resulting in clinical misinterpretations. In order to avoid that, ontology alignment techniques are used to match different ontologies based on specific string and semantic similarity metrics, where very little systematic analysis has been performed on which string similarity metrics behave better. To address this gap, in this paper we are investigating on finding the most efficient string similarity metric, based on an existing approach that can transform any healthcare dataset into HL7 FHIR, through the translation of the latter into ontologies, and their matching through syntactic and semantic similarities. The evaluation of this approach is being performed through the string similarity metrics of the Levenshtein distance, Cosine similarity, Jaro–Winkler distance and Jaccard similarity, resulting that the Levenshtein distance provides more reliable results when dealing with healthcare ontologies.

[1]  Barbara A. Gylys,et al.  MEDICAL TERMINOLOGY SYSTEMS: A BODY SYSTEMS APPROACH , 2004 .

[2]  Esti Suryani,et al.  The Implementation of Jaro-Winkler Distance and Naive Bayes Classifier for Identification System of Pests and Diseases on Paddy , 2018 .

[3]  Dimosthenis Kyriazis,et al.  Aggregating the syntactic and semantic similarity of healthcare data towards their transformation to HL7 FHIR through ontology matching , 2019, Int. J. Medical Informatics.

[4]  Manish M. Potey,et al.  Semantic Search based on Ontology Alignment for Information Retrieval , 2014 .

[5]  Pradipta Maji,et al.  City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs. , 2014, Molecular bioSystems.

[6]  Dimosthenis Kyriazis,et al.  Towards a Secure Semantic Knowledge of Healthcare Data Through Structural Ontological Transformations , 2018, JCKBSE.

[7]  Pascal Hitzler,et al.  String Similarity Metrics for Ontology Alignment , 2013, SEMWEB.

[8]  Catalina Martínez-Costa,et al.  A semantic web based framework for the interoperability and exploitation of clinical models and EHR data , 2016, Knowl. Based Syst..

[9]  Alireza Osareh,et al.  ONTOLOGY ALIGNMENT USING MACHINE LEARNING TECHNIQUES , 2011 .

[10]  Byung-Ryul Ahn,et al.  Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[11]  Catia Pesquita,et al.  Improving the interoperability of biomedical ontologies with compound alignments , 2018, Journal of Biomedical Semantics.

[12]  Philipp Koehn,et al.  Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance , 2016, WMT.

[13]  Aman Jain,et al.  Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model , 2017 .

[14]  Somjit Arch-int,et al.  A semantic interoperability approach to health‐care data: Resolving data‐level conflicts , 2016, Expert Syst. J. Knowl. Eng..

[15]  Rafael Valencia-García,et al.  An Ontological Infrastructure for the Semantic Integration of Clinical Archetypes , 2006, PKAW.