PRINCESS: Privacy‐protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS

Motivation: We introduce PRINCESS, a privacy‐preserving international collaboration framework for analyzing rare disease genetic data that are distributed across different continents. PRINCESS leverages Software Guard Extensions (SGX) and hardware for trustworthy computation. Unlike a traditional international collaboration model, where individual‐level patient DNA are physically centralized at a single site, PRINCESS performs a secure and distributed computation over encrypted data, fulfilling institutional policies and regulations for protected health information. Results: To demonstrate PRINCESS’ performance and feasibility, we conducted a family‐based allelic association study for Kawasaki Disease, with data hosted in three different continents. The experimental results show that PRINCESS provides secure and accurate analyses much faster than alternative solutions, such as homomorphic encryption and garbled circuits (over 40 000× faster). Availability and Implementation: https://github.com/achenfengb/PRINCESS_opensource Contact: shw070@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Morris J. Dworkin,et al.  SP 800-38D. Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC , 2007 .

[2]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[3]  Yuchen Zhang,et al.  HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS , 2015, Bioinform..

[4]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[5]  Michael Naehrig,et al.  Private Computation on Encrypted Genomic Data , 2014, LATINCRYPT.

[6]  Dan Bogdanov,et al.  A new way to protect privacy in large-scale genome-wide association studies , 2013, Bioinform..

[7]  R. Ostrovsky,et al.  Identifying genetic relatives without compromising privacy , 2014, Genome research.

[8]  Elaine Shi,et al.  Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound , 2015, IACR Cryptol. ePrint Arch..

[9]  Xiaoqian Jiang,et al.  Cloud-assisted distributed private data sharing , 2015, BCB.

[10]  Tien Yin Wong,et al.  Genome-wide association study identifies FCGR2A as a susceptibility locus for Kawasaki disease , 2011, Nature Genetics.

[11]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[12]  Elaine B. Barker,et al.  SP 800-56A. Recommendation for Pair-Wise Key Establishment Schemes Using Discrete Logarithm Cryptography (Revised) , 2007 .

[13]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[14]  Adam Molyneaux,et al.  Privacy-Preserving Processing of Raw Genomic Data , 2013, DPM/SETOP.

[15]  Cinnamon S Bloss Does family always matter? Public genomes and their effect on relatives , 2013, Genome Medicine.

[16]  Lucila Ohno-Machado,et al.  Comparison of consumers' views on electronic data sharing for healthcare and research , 2015, J. Am. Medical Informatics Assoc..

[17]  Yihua Zhang,et al.  Secure distributed genome analysis for GWAS and sequence comparison computation , 2015, BMC Medical Informatics and Decision Making.

[18]  Xiaoqian Jiang,et al.  PRECISE: PRivacy-preserving cloud-assisted quality improvement service in healthcare , 2014, 2014 8th International Conference on Systems Biology (ISB).

[19]  Walid M Abuhammour,et al.  Kawasaki Disease Hospitalizations in a Predominantly African-American Population , 2005, Clinical pediatrics.

[20]  Jihoon Kim,et al.  Grid Binary LOgistic REgression (GLORE): building shared models without sharing data , 2012, J. Am. Medical Informatics Assoc..

[21]  Xiaoqian Jiang,et al.  VERTIcal Grid lOgistic regression (VERTIGO) , 2016, J. Am. Medical Informatics Assoc..

[22]  Gunnar Rätsch,et al.  Efficient privacy-preserving string search and an application in genomics , 2015, bioRxiv.

[23]  Xiaoqian Jiang,et al.  EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed privacy-preserving online model learning , 2013, J. Biomed. Informatics.

[24]  Xiaoqian Jiang,et al.  iCONCUR: informed consent for clinical data and bio-sample use for research , 2016, AMIA.

[25]  Xiaoqian Jiang,et al.  Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery , 2014, J. Am. Medical Informatics Assoc..

[26]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[27]  J. Hubaux,et al.  Privacy-preserving genomic testing in the clinic: a model using HIV treatment , 2016, Genetics in Medicine.

[28]  Murat Kantarcioglu,et al.  Secure Management of Biomedical Data With Cryptographic Hardware , 2012, IEEE Transactions on Information Technology in Biomedicine.

[29]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[30]  Paul F. Syverson,et al.  A taxonomy of replay attacks [cryptographic protocols] , 1994, Proceedings The Computer Security Foundations Workshop VII.

[31]  Tatsuhiko Tsunoda,et al.  Genetic Variation in the SLC8A1 Calcium Signaling Pathway Is Associated With Susceptibility to Kawasaki Disease and Coronary Artery Abnormalities , 2016, Circulation. Cardiovascular genetics.

[32]  Marcus Peinado,et al.  Controlled-Channel Attacks: Deterministic Side Channels for Untrusted Operating Systems , 2015, 2015 IEEE Symposium on Security and Privacy.

[33]  Paul Suetens,et al.  Modeling 3D Facial Shape from DNA , 2014, PLoS genetics.

[34]  Xiaoqian Jiang,et al.  Privacy-preserving GWAS analysis on federated genomic datasets , 2015, BMC Medical Informatics and Decision Making.

[35]  Raymond Heatherly,et al.  SecureMA: protecting participant privacy in genetic association meta-analysis , 2014, Bioinform..

[36]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[37]  Xiaoqian Jiang,et al.  PREMIX: PRivacy-preserving EstiMation of Individual admiXture , 2016, AMIA.

[38]  C. Bustamante,et al.  Privacy Risks from Genomic Data-Sharing Beacons , 2015, American journal of human genetics.

[39]  Amy L McGuire,et al.  Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: points to consider , 2008, Genetics in Medicine.

[40]  Tudor Groza,et al.  State of the art and open challenges in community-driven knowledge curation , 2013, J. Biomed. Informatics.

[41]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[42]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[43]  Thomas Plantard,et al.  Reaction Attack on Outsourced Computing with Fully Homomorphic Encryption Schemes , 2011, ICISC.

[44]  Ittai Anati,et al.  Innovative Technology for CPU Based Attestation and Sealing , 2013 .