PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

Searchable symmetric encryption (SSE) has been used to protect the confidentiality of genomic data while providing substring search and range queries on a sequence of genomic data, but it has not been studied for protecting single nucleotide polymorphism (SNP)-phenotype data. In this article, we propose a novel model, PrivGenDB, for securely storing and efficiently conducting different queries on genomic data outsourced to an honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure confidentiality while conducting different types of queries on encrypted genomic data, phenotype and other information of individuals to help analysts/clinicians in their analysis/care. To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of shared SNP-phenotype data through encryption while making the computation/query process efficient and scalable for biomedical research and care. Furthermore, it supports a variety of query types on genomic data, including count queries, Boolean queries, and k′-out-of-k match queries. Finally, the PrivGenDB model handles the dataset containing both genotype and phenotype, and it also supports storing and managing other metadata like gender and ethnicity privately. Computer evaluations on a dataset with 5, 000 records and 1, 000 SNPs demonstrate that a count/Boolean query and a k′-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4μs, respectively, outperforming the existing schemes.

[1]  Ling Liu,et al.  Searchable Encryption for Healthcare Clouds: A Survey , 2018, IEEE Transactions on Services Computing.

[2]  Ron Steinfeld,et al.  Multi-Client Cloud-Based Symmetric Searchable Encryption , 2021, IEEE Transactions on Dependable and Secure Computing.

[3]  Kathryn A Phillips,et al.  Precision Medicine: From Science To Value. , 2018, Health affairs.

[4]  Massoud Hadian Dehkordi,et al.  Private and Efficient Query Processing on Outsourced Genomic Databases , 2017, IEEE Journal of Biomedical and Health Informatics.

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Murat Kantarcioglu,et al.  Secure Management of Biomedical Data With Cryptographic Hardware , 2012, IEEE Transactions on Information Technology in Biomedicine.

[7]  Ning Zhang,et al.  When gene meets cloud: Enabling scalable and efficient range query on encrypted genomic data , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[8]  Noboru Kunihiro,et al.  Searchable symmetric encryption capable of searching for an arbitrary string , 2016, Secur. Commun. Networks.

[9]  Ron Steinfeld,et al.  Practical Backward-Secure Searchable Encryption from Symmetric Puncturable Encryption , 2018, CCS.

[10]  Murat Kantarcioglu,et al.  A Cryptographic Approach to Securely Share and Query Genomic Sequences , 2008, IEEE Transactions on Information Technology in Biomedicine.

[11]  Luyao Chen,et al.  Secure large-scale genome data storage and query , 2018, Comput. Methods Programs Biomed..

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Nita A. Farahany,et al.  Redefining Genomic Privacy: Trust and Empowerment , 2014, bioRxiv.

[14]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[15]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[16]  A. Kuo Opportunities and Challenges of Cloud Computing to Improve Health Care Services , 2011, Journal of medical Internet research.

[17]  Jihoon Kim,et al.  PRINCESS: Privacy‐protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS , 2017, Bioinform..

[18]  G. Gibson Population genetics and GWAS: A primer , 2018, PLoS biology.

[19]  Ahmed Tamrawi,et al.  eHealth Cloud Security Challenges: A Survey , 2019, Journal of healthcare engineering.

[20]  Xiaoqian Jiang,et al.  SCOTCH: Secure Counting Of encrypTed genomiC data using a Hybrid approach , 2017, AMIA.

[21]  Angelo De Caro,et al.  jPBC: Java pairing based cryptography , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[22]  Noman Mohammed,et al.  Secure Count Query on Encrypted Genomic Data , 2017, J. Biomed. Informatics.

[23]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[24]  Mikhail J. Atallah,et al.  Securing Aggregate Queries for DNA Databases , 2019, IEEE Transactions on Cloud Computing.

[25]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[26]  Carl A. Gunter,et al.  Privacy in the Genomic Era , 2014, ACM Comput. Surv..

[27]  Valerio Persico,et al.  Big Data for Health , 2019, Encyclopedia of Big Data Technologies.

[28]  Dongxi Liu,et al.  Result Pattern Hiding Searchable Encryption for Conjunctive Queries , 2018, CCS.

[29]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[30]  Rüdiger Zarnekow,et al.  Improving Individual Acceptance of Health Clouds through Confidentiality Assurance , 2016, Applied Clinical Informatics.

[31]  Hugo Krawczyk,et al.  Highly-Scalable Searchable Symmetric Encryption with Support for Boolean Queries , 2013, IACR Cryptol. ePrint Arch..

[32]  Ron Steinfeld,et al.  Geometric Range Search on Encrypted Data With Forward/Backward Security , 2020, IEEE Transactions on Dependable and Secure Computing.

[33]  Dongxi Liu,et al.  GraphSE²: An Encrypted Graph Database for Privacy-Preserving Social Search , 2019, AsiaCCS.

[34]  Dawu Gu,et al.  Practical Non-Interactive Searchable Encryption with Forward and Backward Privacy , 2021, NDSS.

[35]  Hyunghoon Cho,et al.  Emerging technologies towards enhancing privacy in genomic data sharing , 2019, Genome Biology.