Data Integration for Future Medicine (DIFUTURE)

Summary Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers. Objectives: The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing. Governance and Policies: Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change. Architectural Framework and Methodology: The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing. Use Cases: From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios. Discussion: Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.

[1]  Patrice Degoulet,et al.  The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience , 2017, Int. J. Medical Informatics.

[2]  Roland Bouman,et al.  Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration , 2010 .

[3]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[4]  Axel Schumacher,et al.  A collaborative approach to develop a multi-omics data analytics platform for translational research , 2014, Applied & translational genomics.

[5]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[6]  M. Templ Statistical Disclosure Control for Microdata Using the R-Package sdcMicro , 2008, Trans. Data Priv..

[7]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[8]  Paul A. Harris,et al.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability , 2016, J. Am. Medical Informatics Assoc..

[9]  Peter Penndorf,et al.  A workflow-driven approach to integrate generic software modules in a Trusted Third Party , 2015, Journal of Translational Medicine.

[10]  Gabriele Weiler,et al.  The p-medicine portal—a collaboration platform for research in personalised medicine , 2014, Ecancermedicalscience.

[11]  Björn Hagemeier,et al.  UNICORE 6 — Recent and Future Advancements , 2010, Ann. des Télécommunications.

[12]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[13]  Douglas MacFadden,et al.  SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies , 2013, PloS one.

[14]  Fabian Prasser,et al.  A Scalable and Pragmatic Method for the Safe Sharing of High-Quality Health Data , 2018, IEEE Journal of Biomedical and Health Informatics.

[15]  Dina Aronzon,et al.  tranSMART: An Open Source Knowledge Management and High Content Data Analytics Platform , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[16]  A. Loraine,et al.  Analysis and visualization of RNA-Seq expression data using RStudio, Bioconductor, and Integrated Genome Browser. , 2015, Methods in molecular biology.

[17]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[18]  Sébastien Jodogne,et al.  Orthanc - A lightweight, restful DICOM server for healthcare and medical research , 2013, 2013 IEEE 10th International Symposium on Biomedical Imaging.

[19]  Hans-Ulrich Prokosch,et al.  The Integrated Data Repository Toolkit (IDRT): accelerating translational research infrastructures , 2015, Journal of Clinical Bioinformatics.

[20]  Kamran Sartipi,et al.  HL7 FHIR: An Agile and RESTful approach to healthcare information exchange , 2013, Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems.

[21]  Paul A. Harris,et al.  Secondary use of clinical data: The Vanderbilt approach , 2014, J. Biomed. Informatics.

[22]  Fabian Prasser,et al.  ARX - A Comprehensive Tool for Anonymizing Biomedical Data , 2014, AMIA.

[23]  Patrice Degoulet,et al.  A clinical data warehouse-based process for refining medication orders alerts , 2012, J. Am. Medical Informatics Assoc..

[24]  Klaus A. Kuhn,et al.  Orchestrating differential data access for translational research: a pilot implementation , 2017, BMC Medical Informatics and Decision Making.

[25]  Felix Ritchie,et al.  Secure access to confidential microdata: four years of the Virtual Microdata Laboratory , 2008 .

[26]  Bernd Rinn,et al.  openBIS: a flexible framework for managing and analyzing complex data in biology research , 2011, BMC Bioinformatics.

[27]  L. Hood,et al.  P4 medicine: how systems medicine will transform the healthcare sector and society. , 2013, Personalized medicine.

[28]  Alex Endert,et al.  Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes , 2016, IEEE Transactions on Visualization and Computer Graphics.

[29]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[30]  Sandra Gesing,et al.  From the desktop to the grid: scalable bioinformatics via workflow conversion , 2016, BMC Bioinformatics.

[31]  Murat Kantarcioglu,et al.  Composite Bloom Filters for Secure Record Linkage , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  Timothy R. Olsen,et al.  The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. , 2007, Neuroinformatics.

[33]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[34]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[35]  Fabian Prasser,et al.  A generic solution for web-based management of pseudonymized data , 2015, BMC Medical Informatics and Decision Making.

[36]  Jordi Rambla De Argila,et al.  Consent Codes: Upholding Standard Data Use Conditions , 2016, PLoS genetics.