Design and generation of Linked Clinical Data Cubes

Clinical Study Data Exchange technologies, based on XML, have improved the data capture phase of clinical data and enabled larger and more diverse longitudinal clinical research studies. There is now a growing interest in this community for solutions based on Semantic Web standards. Healthcare and life sciences metadata resources such as medication classifications are now shared via linked data platforms. The increasing pressure to make clinical trial data more open is another strong incentive for the adoption of linked open data technologies. This paper describes the application of semantic statistics vocabularies to deliver clinical data as linked data in a form that is easy to consume by statisticians and easy to enrich with links to complementary data sources. We combine the strengths of the RDF Data Cube and DDI-RDF vocabularies to propose a Linked Clinical Data Cube (LCDC), a set of modular data cubes that helps us manage the multi-disciplinary nature of the source data. We validate our approach on the Australian, Imaging, Biomarker and Lifestyle study of Ageing (AIBL). This dataset, comprising more than 1600 variables clustered in 25 different sub-domains, has been fully converted into RDF with one general data cube and one specialised data cube for each sub-domain. This implementation demonstrates the effectiveness of the association of the RDF Data Cube and DDI-RDF vocabularies for the publication of large and diverse clinical datasets as linked data. We also show that the structure of the LCDC overcomes the monolithic nature of clinical data exchange standards and expedites the navigation and querying of the data from multiple views.