Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis

Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data usability. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely on bespoke data sharing contracts. These contracts increase the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between two novel advanced privacy enhancing technologies (PETs): Homomorphic Encryption and Secure Multiparty Computation (defined together as Multiparty Homomorphic Encryption or MHE). These PETs provide a mathematical guarantee of privacy, with MHE providing a performance advantage over separately using HE or SMC. We argue MHE fulfills legal requirements for medical data sharing under the General Data Protection Regulation (GDPR) which has set a global benchmark for data protection. Specifically, the data processed and shared using MHE can be considered anonymized data. We explain how MHE can reduce the reliance on customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research whilst offering additional incentives for healthcare and research institutes to employ common data interoperability standards.

[1]  Dominic N. Staiger Swiss Data Protection Law , 2019 .

[2]  David Evans,et al.  Evaluating Differentially Private Machine Learning in Practice , 2019, USENIX Security Symposium.

[3]  Anand D. Sarwate,et al.  Protecting count queries in study design , 2012, J. Am. Medical Informatics Assoc..

[4]  Zhen Lin,et al.  Genomic Research and Human Subject Privacy , 2004, Science.

[5]  Jean-Pierre Hubaux,et al.  Multiparty Homomorphic Encryption: From Theory to Practice , 2020, IACR Cryptol. ePrint Arch..

[6]  Thomas Steinke,et al.  Bridging the Gap between Computer Science and Legal Approaches to Privacy , 2018 .

[7]  Shai Halevi,et al.  Homomorphic Encryption , 2017, Tutorials on the Foundations of Cryptography.

[8]  Samson Yoseph Esayas The role of anonymisation and pseudonymisation under the EU data privacy rules: beyond the 'all or nothing' approach , 2015, Eur. J. Law Technol..

[9]  Amit Sahai,et al.  Homomorphic Encryption Standard , 2019, IACR Cryptol. ePrint Arch..

[10]  Dan Boneh,et al.  Deriving genomic diagnoses without revealing patient genomes , 2017, Science.

[11]  Marcel Keller,et al.  Overdrive: Making SPDZ Great Again , 2018, IACR Cryptol. ePrint Arch..

[12]  Patrick Kierkegaard,et al.  Electronic health record: Wiring Europe's healthcare , 2011, Comput. Law Secur. Rev..

[13]  Luke Munn,et al.  Clouded data: Privacy and the promise of encryption , 2019, Big Data & Society.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Vinod Vaikuntanathan,et al.  Multiparty Computation with Low Communication, Computation and Interaction via Threshold FHE , 2012, EUROCRYPT.

[16]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[17]  Jean-Pierre Hubaux,et al.  POSEIDON: Privacy-Preserving Federated Neural Network Learning , 2020, NDSS.

[18]  Ion Stoica,et al.  Helen: Maliciously Secure Coopetitive Learning for Linear Models , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[19]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[20]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[21]  Frank Pallas,et al.  They who must not be identified—distinguishing personal from non-personal data under the GDPR , 2020 .

[22]  Denis Regaud Commission Nationale de l'Informatique et des Libertés , 2016 .

[23]  Kevin S. Quinn,et al.  Citizen-centered, auditable and privacy-preserving population genomics , 2019, Nature Computational Science.

[24]  Christopher F. Mondschein,et al.  The EU’s General Data Protection Regulation (GDPR) in a Research Context , 2018, Fundamentals of Clinical Data Science.

[25]  Oliver Butters,et al.  DataSHIELD: taking the analysis to the data, not the data to the analysis , 2014, International journal of epidemiology.

[26]  Dan Bogdanov,et al.  Sharemind: A Framework for Fast Privacy-Preserving Computations , 2008, ESORICS.

[27]  Heidi Ledford,et al.  High-profile coronavirus retractions raise concerns about data oversight , 2020, Nature.

[28]  Jan O. Korbel,et al.  Genomics: data sharing needs an international code of conduct , 2020, Nature.

[29]  B. Malin,et al.  Correction: A Systematic Review of Re-Identification Attacks on Health Data , 2015, PloS one.

[30]  Mark Barnes,et al.  How to fix the GDPR's frustration of global biomedical research , 2020, Science.

[31]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[32]  Rickmer Braren,et al.  Secure, privacy-preserving and federated machine learning in medical imaging , 2020, Nature Machine Intelligence.

[33]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[34]  David Lie,et al.  Safe Sharing Sites , 2019 .

[35]  Bartha M Knoppers,et al.  An ethics safe harbor for international genomics research? , 2013, Genome Medicine.

[36]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[37]  Craig Gentry,et al.  Computing arbitrary functions of encrypted data , 2010, CACM.

[38]  Thomas Steinke,et al.  Differential Privacy: A Primer for a Non-Technical Audience , 2018 .

[39]  Reza Nasirigerdeh,et al.  sPLINK: A Federated, Privacy-Preserving Tool as a Robust Alternative to Meta-Analysis in Genome-Wide Association Studies , 2020, bioRxiv.

[40]  Dan Boneh,et al.  Threshold Cryptosystems From Threshold Fully Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[41]  Jean-Pierre Hubaux,et al.  MedCo: Enabling Secure and Privacy-Preserving Exploration of Distributed Clinical and Genomic Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  P. Burton,et al.  Securing the Data Economy: Translating Privacy and Enacting Security in the Development of DataSHIELD , 2012, Public Health Genomics.

[43]  Susan E Wallace,et al.  Protecting Personal Data in Epidemiological Research: DataSHIELD and UK Law , 2014, Public Health Genomics.

[44]  Eun Yong Kang,et al.  Identification of individuals by trait prediction using whole-genome sequencing data , 2017, Proceedings of the National Academy of Sciences.

[45]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[46]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[47]  Xiaoqian Jiang,et al.  Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks , 2017, J. Am. Medical Informatics Assoc..

[48]  Gerald Spindler,et al.  Personal Data and Encryption in the European General Data Protection Regulation , 2016 .

[49]  John Wilbanks,et al.  Assessing the consequences of decentralizing biomedical research , 2019, Big Data Soc..

[50]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[51]  Dear Mr Sotiropoulos ARTICLE 29 Data Protection Working Party , 2013 .

[52]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[53]  Maria Rigaki,et al.  A Survey of Privacy Attacks in Machine Learning , 2020, ArXiv.

[54]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[55]  Murat Kantarcioglu,et al.  Detecting the Presence of an Individual in Phenotypic Summary Data , 2018, AMIA.

[56]  Marcel Keller,et al.  Practical Covertly Secure MPC for Dishonest Majority - Or: Breaking the SPDZ Limits , 2013, ESORICS.

[57]  C. Bustamante,et al.  Privacy Risks from Genomic Data-Sharing Beacons , 2015, American journal of human genetics.

[58]  Ferath Kherif,et al.  Multiple Linear Regression: Bayesian Inference for Distributed and Big Data in the Medical Informatics Platform of the Human Brain Project , 2018 .

[59]  David L. Buckeridge,et al.  The re-identification risk of Canadians from longitudinal demographics , 2011, BMC Medical Informatics Decis. Mak..

[60]  Randolph A. Miller,et al.  Reducing patient re-identification risk for laboratory results within research datasets , 2013, J. Am. Medical Informatics Assoc..

[61]  Jiahong Chen,et al.  How the best-laid plans go awry: the (unsolved) issues of applicable law in the General Data Protection Regulation , 2016 .

[62]  Ivan Damgård,et al.  Multiparty Computation from Threshold Homomorphic Encryption , 2000, EUROCRYPT.