ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository

BackgroundThe volume and complexity of patient data – especially in personalised medicine – is steadily increasing, both regarding clinical data and genomic profiles: Typically more than 1,000 items (e.g., laboratory values, vital signs, diagnostic tests etc.) are collected per patient in clinical trials. In oncology hundreds of mutations can potentially be detected for each patient by genomic profiling. Therefore data integration from multiple sources constitutes a key challenge for medical research and healthcare.MethodsSemantic annotation of data elements can facilitate to identify matching data elements in different sources and thereby supports data integration. Millions of different annotations are required due to the semantic richness of patient data. These annotations should be uniform, i.e., two matching data elements shall contain the same annotations. However, large terminologies like SNOMED CT or UMLS don’t provide uniform coding. It is proposed to develop semantic annotations of medical data elements based on a large-scale public metadata repository. To achieve uniform codes, semantic annotations shall be re-used if a matching data element is available in the metadata repository.ResultsA web-based tool called ODMedit (https://odmeditor.uni-muenster.de/) was developed to create data models with uniform semantic annotations. It contains ~800,000 terms with semantic annotations which were derived from ~5,800 models from the portal of medical data models (MDM). The tool was successfully applied to manually annotate 22 forms with 292 data items from CDISC and to update 1,495 data models of the MDM portal.ConclusionUniform manual semantic annotation of data models is feasible in principle, but requires a large-scale collaborative effort due to the semantic richness of patient data. A web-based tool for these annotations is available, which is linked to a public metadata repository.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Martin Dugas,et al.  Integrated Data Management for Clinical Studies: Automatic Transformation of Data Models with Semantic Annotations for Principal Investigators, Data Managers and Statisticians , 2014, PloS one.

[3]  Miguel García-Remesal,et al.  ONTOFUSION: Ontology-based integration of genomic and clinical databases , 2006, Comput. Biol. Medicine.

[4]  Christian Biemann,et al.  Interactive and Iterative Annotation for Biomedical Entity Recognition , 2015, BIH.

[5]  Christopher G. Chute,et al.  Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience , 2011, J. Am. Medical Informatics Assoc..

[6]  Nicolette de Keizer,et al.  The role of standardized data and terminological systems in computerized clinical decision support systems: Literature review and survey , 2011, Int. J. Medical Informatics.

[7]  Philipp Neuhaus,et al.  Portal of medical data models: information infrastructure for medical research and healthcare , 2016, Database J. Biol. Databases Curation.

[8]  Yuval Shahar,et al.  An architecture for linking medical decision-support applications to clinical databases and its evaluation , 2009, J. Biomed. Informatics.

[9]  O Gefeller,et al.  Memorandum “Open Metadata” , 2015, Methods of Information in Medicine.

[10]  M Dugas,et al.  Missing semantic annotation in databases. The root cause for data integration and migration problems in information systems. , 2014, Methods of information in medicine.

[11]  M. Dugas,et al.  A European inventory of common electronic health record data elements for clinical trial feasibility , 2014, Trials.

[12]  Kenneth Getz,et al.  Protocol Design Trends and their Effect on Clinical Trial Performance A new study suggests that changes in protocol design may be adversely affecting clinical trial performance. Kenneth Getz discusses the results. , 2008 .

[13]  Robert A. Israel,et al.  International Classification of Diseases (ICD) , 2005 .

[14]  Mor Peleg,et al.  A practical method for transforming free-text eligibility criteria into computable criteria , 2011, J. Biomed. Informatics.

[15]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[16]  Julian Varghese,et al.  Standardized Cardiovascular Quality Assurance Forms with Multilingual Support, UMLS Coding and Medical Concept Analyses , 2015, MedInfo.

[17]  Euan A Ashley,et al.  Clinical interpretation and implications of whole-genome sequencing. , 2014, JAMA.

[18]  Sam Brandt,et al.  Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications , 2011, J. Am. Medical Informatics Assoc..

[19]  Sandra Heiler,et al.  Semantic interoperability , 1995, CSUR.

[20]  Fleur Fritz,et al.  Interoperability in clinical research: from metadata registries to semantically annotated CDISC ODM. , 2012, Studies in health technology and informatics.

[21]  Martin Dugas,et al.  The need for harmonized structured documentation and chances of secondary use - Results of a systematic analysis with automated form comparison for prostate and breast cancer , 2014, J. Biomed. Informatics.