A distributed analytics platform to execute FHIR based phenotyping algorithms

despite the benefits of reusing health data collected in routine care, sharing datasets outside of the organizational boundaries is not always possible due to the legal and ethical restrictions. The Personal Health Train (PHT) is a novel privacy-preserving approach to execute analytics tasks at distributed data repositories, without sharing the data itself. In this work, we report a proof-of-concept implementation of the PHT by using FHIR data standards and Clinical Query Language (CQL). The Semantic Web and containerization technologies have been utilized to move computations to the data. We developed tools to design phenotyping algorithms on the data consumer side, implemented an infrastructure to transfer and execute Docker containers at the data centers, and to return results to the consumers. We experimented the evaluated PHT infrastructure and tools by designing a phenotyping algorithm for diabetes mellitus and prostate cancer risk case-control study and executed it at three distributed FHIR repositories.

[1]  Timo M. Deist,et al.  Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT , 2017, Clinical and translational radiation oncology.

[2]  Pär Stattin,et al.  Diabetes Mellitus and Prostate Cancer Risk; A Nationwide Case–Control Study within PCBaSe Sweden , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[3]  Fleur Fritz,et al.  Electronic health records to facilitate clinical research , 2016, Clinical Research in Cardiology.

[4]  Fabian Prasser,et al.  Data Integration for Future Medicine (DIFUTURE) , 2018, Methods of Information in Medicine.

[5]  Gauthier Chassang,et al.  The impact of the EU general data protection regulation on scientific research , 2017, Ecancermedicalscience.

[6]  Alfred Winter,et al.  Towards Phenotyping of Clinical Trial Eligibility Criteria , 2018, eHealth.

[7]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[8]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[9]  Brandon L Pierce,et al.  Diabetes mellitus and prostate cancer risk , 2008, The Prostate.

[10]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[11]  Kareem Sioufi,et al.  Conjunctival Tumors in 5002 Cases. Comparative Analysis of Benign Versus Malignant Counterparts. The 2016 James D. Allen Lecture. , 2017, American journal of ophthalmology.

[12]  Stefan Decker,et al.  Schema Extraction for Privacy Preserving Processing of Sensitive Data , 2018 .

[13]  Jan E. Gewehr,et al.  Smart Medical Information Technology for Healthcare (SMITH) , 2018, Methods of Information in Medicine.

[14]  P. Lambin,et al.  Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept. , 2016, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[15]  Eric Prud'hommeaux,et al.  Developing a Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR , 2017, SWAT4LS.