Enabling Privacy-Preserving Sharing of Genomic Data for GWASs in Decentralized Networks

The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In this work, we propose a preventive approach for privacy-preserving sharing of genomic data in decentralized networks for Genome-wide association studies (GWASs), which have been widely used in discovering the association between genotypes and phenotypes. The key components of this work are: a decentralized secure network, with a privacy- preserving sharing protocol, and a gene fragmentation framework that is trainable in an end-to-end manner. Our experiments on real datasets show the effectiveness of our privacy-preserving approaches as well as significant improvements in efficiency when compared with recent, related algorithms.

[1]  Caitlin Curtis,et al.  DNA facial prediction could make protecting your privacy more difficult , 2018 .

[2]  Emiliano De Cristofaro,et al.  Genodroid: are privacy-preserving genomic tests ready for prime time? , 2012, WPES '12.

[3]  Yuval Ishai,et al.  Perfectly Secure Multiparty Computation and the Computational Overhead of Cryptography , 2010, IACR Cryptol. ePrint Arch..

[4]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[5]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[6]  Dan Boneh,et al.  Deriving genomic diagnoses without revealing patient genomes , 2017, Science.

[7]  P. Holub,et al.  BBMRI-ERIC Directory: 515 Biobanks with Over 60 Million Biological Samples , 2016, Biopreservation and biobanking.

[8]  Dan Boneh,et al.  Fast Variants of RSA , 2007 .

[9]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[10]  Xiaoqian Jiang,et al.  SAFETY: Secure gwAs in Federated Environment through a hYbrid Solution , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Bo Peng,et al.  Large-Scale Privacy-Preserving Mapping of Human Genomic Sequences on Hybrid Clouds , 2012, NDSS.

[12]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[13]  Ian J. Goodfellow,et al.  On distinguishability criteria for estimating generative models , 2014, ICLR.

[14]  Grace Hui Yang,et al.  Differential Privacy for Information Retrieval , 2018, WSDM.

[15]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[16]  Craig Gentry,et al.  Fully Homomorphic Encryption over the Integers , 2010, EUROCRYPT.

[17]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[18]  B. Barak Fully Homomorphic Encryption and Post Quantum Cryptography , 2010 .

[19]  Dan Bogdanov,et al.  Rmind: A Tool for Cryptographically Secure Statistical Analysis , 2016, IEEE Transactions on Dependable and Secure Computing.

[20]  Dan Bogdanov,et al.  A new way to protect privacy in large-scale genome-wide association studies , 2013, Bioinform..

[21]  Bonnie Berger,et al.  Enabling Privacy Preserving GWAS in Heterogeneous Human Populations , 2016, RECOMB.

[22]  Stefan Katzenbeisser,et al.  Privacy-Preserving Matching of DNA Profiles , 2008, IACR Cryptol. ePrint Arch..

[23]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[24]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[25]  Latanya Sweeney,et al.  Identifying Participants in the Personal Genome Project by Name , 2013, ArXiv.

[26]  Don Tapscott,et al.  Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World , 2016 .

[27]  Yuchen Zhang,et al.  HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS , 2015, Bioinform..

[28]  Vitaly Shmatikov,et al.  Towards Practical Privacy for Genomic Computation , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[29]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[30]  Yacov Y. Haimes,et al.  Multiobjective optimization in water resources systems : the surrogate worth trade-off method , 1975 .

[31]  Emiliano De Cristofaro,et al.  Countering GATTACA: efficient and secure testing of fully-sequenced human genomes , 2011, CCS '11.

[32]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[33]  Eun Yong Kang,et al.  Identification of individuals by trait prediction using whole-genome sequencing data , 2017, Proceedings of the National Academy of Sciences.

[34]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[35]  Kristian Gjøsteen,et al.  A New Security Proof for Damgård's ElGamal , 2006, CT-RSA.

[36]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.