A Data Model Based on Semantically Enhanced HL7 RIM for Sharing Patient Data of Breast Cancer Clinical Trials

Breast cancer clinical trial researchers have to handle heterogeneous data coming from different data sources, overloading biomedical researchers when they need to query data for retrospective analysis. This paper presents the Common Data Model (CDM) proposed within the INTEGRATE EU project to homogenize data coming from different clinical partners. This CDM is based on the Reference Information Model (RIM) from the Health Level 7 (HL7) version 3. Semantic capabilities through an SPARQL endpoint were also required to ensure the sustainability of the platform. For the SPARQL endpoint implementation, a comparison has been carried out between a Relational SQL database + D2R and a RDF database. The results show that the first option can store all clinical data received from institutions participating in the project with a better performance. It has been also evaluated by the EU Commission within a patient recruitment demonstrator.