Privacy Preserving Fisher’s Exact Test on Genomic Data

Privacy of genomic data has become increasingly significant as genome sequencing is more readily available for research. It is imperative to protect genomic data as we use it for the progression of medicine. In this paper, we propose a new privacy preserving Fisher’s Exact Test algorithm for genomic data based on Boneh-Goh-Nissim (BGN) cryptosystem. This is a novel approach that has yet to be done to the best of our knowledge. Due to BGN’s homomorphic properties, researchers can keep data private while calculating the correct test results without ever decrypting the data itself. We investigate the usage of the BGN cryptosystem on statistical computations, Fisher’s Exact Test in particular, analyzing its security, efficiency, and correctness in the real world of genomic data research. We implement our BGN-based privacy preserving Fisher’s Exact Test algorithm and test it extensively using real genomic data from international genome database. The result shows that our algorithm is efficient and practical.

[1]  Craig Gentry,et al.  Better Bootstrapping in Fully Homomorphic Encryption , 2012, Public Key Cryptography.

[2]  Ming Li,et al.  Toward Practical Privacy-Preserving Frequent Itemset Mining on Encrypted Cloud Data , 2020, IEEE Transactions on Cloud Computing.

[3]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[4]  Jeffrey Martin,et al.  Statistical methods for identifying differentially expressed genes in RNA-Seq experiments , 2012, Cell & Bioscience.

[5]  Murat Kantarcioglu,et al.  Detecting the Presence of an Individual in Phenotypic Summary Data , 2018, AMIA.

[6]  John Ludbrook,et al.  Analysis of 2 x 2 tables of frequencies: matching test to experimental design. , 2008, International journal of epidemiology.

[7]  Peilin Jia,et al.  Gene set analysis of genome-wide association studies: methodological issues and perspectives. , 2011, Genomics.

[8]  Noman Mohammed,et al.  Secure Count Query on Encrypted Genomic Data: A Survey , 2018, IEEE Internet Computing.

[9]  Michael Naehrig,et al.  Private Computation on Encrypted Genomic Data , 2014, LATINCRYPT.

[10]  Dan Boneh,et al.  Evaluating 2-DNF Formulas on Ciphertexts , 2005, TCC.

[11]  H. K. Srivastava,et al.  Performance Based Comparison Study of RSA and Elliptic Curve Cryptography , 2013 .

[12]  Michael Naehrig,et al.  Manual for Using Homomorphic Encryption for Bioinformatics , 2017, Proceedings of the IEEE.

[14]  Elisa Bertino,et al.  Fully Homomorphic Encryption , 2014 .

[15]  Carl A. Gunter,et al.  Privacy in the Genomic Era , 2014, ACM Comput. Surv..

[16]  Mete Akgün,et al.  Privacy preserving processing of genomic data: A survey , 2015, J. Biomed. Informatics.

[17]  Shucheng Yu,et al.  Privacy Preserving Back-Propagation Neural Network Learning Made Practical with Cloud Computing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[18]  Homomorphic Encryption and the BGN Cryptosystem , 2011 .

[19]  Ping Wang,et al.  Computing elliptic curve discrete logarithms with improved baby-step giant-step algorithm , 2017, Adv. Math. Commun..

[20]  P. Visscher,et al.  Causal associations between risk factors and common diseases inferred from GWAS summary data , 2017, bioRxiv.