Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies

BackgroundThere is an increasing recognition of the need for the data capture phase of clinical studies to be improved and for more effective sharing of clinical data. The Health Care and Life Sciences community has embraced semantic technologies to facilitate the integration of health data from electronic health records, clinical studies and pharmaceutical research. This paper explores the integration of clinical study data exchange standards and semantic statistic vocabularies to deliver clinical data as linked data in a format that is easier to enrich with links to complementary data sources and consume by a broad user base.MethodsWe propose a Linked Clinical Data Cube (LCDC), which combines the strength of the RDF Data Cube and DDI-RDF vocabulary to enrich clinical data based on the CDISC standards. The CDISC standards provide the mechanisms for the data to be standardised, made more accessible and accountable whereas the RDF Data Cube and DDI-RDF vocabularies provide novel approaches to managing large volumes of heterogeneous linked data resources.ResultsWe validate our approach using a large-scale longitudinal clinical study into neurodegenerative diseases. This dataset, comprising more than 1600 variables clustered in 25 different sub-domains, has been fully converted into RDF forming one main data cube and one specialised cube for each sub-domain. One sub-domain, the Medications specialised cube, has been linked to relevant external vocabularies, such as the Australian Medicines Terminology and the ATC DDD taxonomy and DrugBank terminology. This provides new dimensions on which to query the data that promote the exploration of drug-drug and drug-disease interactions.ConclusionsThis implementation highlights the effectiveness of the association of the semantic statistics vocabularies for the publication of large heterogeneous data sets as linked data and the integration of the semantic statistics vocabularies with the CDISC standards. In particular, it demonstrates the potential of the two vocabularies in overcoming the monolithic nature of the underlying model and improving the navigation and querying of the data from multiple angles to support richer data analysis of clinical study data. The forecasted benefits are more efficient use of clinicians’ time and the potential to facilitate cross-study analysis.

[1]  Andrea Splendiani,et al.  Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences , 2011, SWAT4LS 2011.

[2]  Ziqi Zhang,et al.  Proceedings of the Second International Workshop on Linked Data for Information Extraction (LD4IE 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 20, 2014 , 2014, LD4IE@ISWC.

[3]  C. Rowe,et al.  The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer's disease , 2009, International Psychogeriatrics.

[4]  R D Rudolf,et al.  High Blood Pressure , 1937, Canadian Medical Association journal.

[5]  Zhengwu Lu,et al.  Clinical data management: Current status, challenges, and future directions from industry perspectives , 2010 .

[6]  Michel Dumontier,et al.  Towards pharmacogenomics knowledge discovery with the semantic web , 2009, Briefings Bioinform..

[7]  Erhard Rahm,et al.  GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution , 2011, J. Biomed. Semant..

[8]  Yu Qian,et al.  Toward an ontology-based framework for clinical research databases , 2011, J. Biomed. Informatics.

[9]  Jim Davies,et al.  Models for forms , 2011, SPLASH Workshops.

[10]  Bernd Neumayr,et al.  Semantic Cockpit: An Ontology-Driven, Interactive Business Intelligence Tool for Comparative Data Analysis , 2011, ER Workshops.

[11]  Jessica Griggs Eating your way to dementia , 2013 .

[12]  Michael Lawley,et al.  Using Australian Medicines Terminology (AMT) and SNOMED CT-AU to better support clinical research , 2012, HIC.

[13]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[14]  Richard H. Scheuermann,et al.  The Human Studies Database Project: Federating Human Studies Design Data Using the Ontology of Clinical Research , 2010, Summit on translational bioinformatics.

[15]  C Ohmann,et al.  Future Developments of Medical Informatics from the Viewpoint of Networked Clinical Research , 2009, Methods of Information in Medicine.

[16]  Carole A. Goble,et al.  API-centric Linked Data integration: The Open PHACTS Discovery Platform case study , 2014, J. Web Semant..

[17]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[18]  Mirina Grosz,et al.  World Wide Web Consortium , 2010 .

[19]  Hugo Leroux,et al.  On Selecting a Clinical Trial Management System for Large Scale, Multi-Centre, Multi-Modal Clinical Research Study , 2011, HIC.

[20]  Victor L. Villemagne,et al.  Enabling a multidisciplinary approach to the study of ageing and Alzheimer's disease: An update from the Australian Imaging Biomarkers and Lifestyle (AIBL) study , 2013, International review of psychiatry.

[21]  Laurent Lefort,et al.  Design and generation of Linked Clinical Data Cubes , 2013, SemStats@ISWC.

[22]  Michael Lawley,et al.  Mapping the Queensland Health iPharmacy Medication File to the Australian Medicines Terminology Using Snapper , 2011, HIC.

[23]  Chimezie Ogbuji A Framework Ontology for Computer-Based Patient Record Systems , 2011, ICBO.

[24]  Tommi Tervonen,et al.  Clinical trials information in drug development and regulation , 2012 .

[25]  Bin Chen,et al.  The ChEMBL database as linked open data , 2013, Journal of Cheminformatics.

[26]  Efthimios Tambouris,et al.  The linked medical data access control framework , 2014, J. Biomed. Informatics.

[27]  Laurent Lefort,et al.  Using CDISC ODM and the RDF Data Cube for the Semantic Enrichment of Longitudinal Clinical Trial Data , 2012, SWAT4LS.

[28]  Egon L. Willighagen,et al.  Linked open drug data for pharmaceutical research and development , 2011, J. Cheminformatics.

[29]  A. Breckenridge,et al.  Open Clinical Trial Data for All? A View from Regulators , 2012, PLoS medicine.

[30]  Tim Williams A Primer on Converting Analysis Results Data to RDF Data Cubes using Free and Open Source Tools , 2014 .

[31]  Nedjeljko Frančula The National Academies Press , 2013 .

[32]  Michael Schrefl,et al.  Bitemporal Support for Business Process Contingency Management , 2015, ER Workshops.

[33]  Vipul Kashyap,et al.  The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside , 2011, J. Biomed. Semant..

[34]  Geert Jan Biessels,et al.  Diabetes and other vascular risk factors for dementia: which factor matters most? A systematic review. , 2008, European journal of pharmacology.

[35]  Asuman Dogac,et al.  Providing Semantic Interoperability Between Clinical Care and Clinical Research Domains , 2013, IEEE Journal of Biomedical and Health Informatics.

[36]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[37]  Tommi Tervonen,et al.  Deficiencies in the transfer and availability of clinical trials evidence: a review of existing systems and standards , 2012, BMC Medical Informatics and Decision Making.

[38]  Robert C. Glen,et al.  Quantifying the shifts in physicochemical property space introduced by the metabolism of small organic molecules , 2013, Journal of Cheminformatics.

[39]  Egon L. Willighagen,et al.  Emerging practices for mapping and linking life sciences data using RDF - A case series , 2012, J. Web Semant..

[40]  Egon L. Willighagen,et al.  Linking the Resource Description Framework to cheminformatics and proteochemometrics , 2011, J. Biomed. Semant..