Towards Secure and Fast Mapping of Genomic Sequences on Public Clouds

The rapid advances in genomic technologies have led to the exponential growth of genomic data. On one hand, clinics and research institutions need to consider the security issue since the data privacy needs to be protected. On the other hand, they look for the means to improve the scalability and performance of genomic applications to be able to handle large amount of data as well as heavy computations. While existing approaches have to sacrifice one for the other, we aim at achieving all the three goals above. In this paper, we design an entire secure framework for genomic data processing on public clouds. Based on this framework, we propose a 3-encryption-scheme model for genomic sequence mapping (3EGSM), an important phase of genomic computation. The model protects not only genomic sequences but also the intermediate and final computation results when processing on public clouds. We evaluate the proposed framework through intensive experiments using real genomic data. The experimental results show that the proposed framework reduces the sequential mapping time by up to 75% compared to a baseline approach that considers only the security issue. The experimental results also show that the framework achieves high speedup when performing parallel processing.

[1]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[2]  S. Chanock,et al.  Using genetic variation to study human disease. , 2001, Trends in molecular medicine.

[3]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[4]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.

[5]  Vitaly Shmatikov,et al.  Towards Practical Privacy for Genomic Computation , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[6]  Stefan Katzenbeisser,et al.  Privacy-Preserving Matching of DNA Profiles , 2008, IACR Cryptol. ePrint Arch..

[7]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[8]  Zhou Li,et al.  Privacy-preserving genomic computation through program specialization , 2009, CCS.

[9]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[10]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[11]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[12]  Jonathan Katz,et al.  Faster Secure Two-Party Computation Using Garbled Circuits , 2011, USENIX Security Symposium.

[13]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[14]  Abhi Shelat,et al.  Efficient Secure Computation with Garbled Circuits , 2011, ICISS.

[15]  Bo Peng,et al.  To Release or Not to Release: Evaluating Information Leaks in Aggregate Human-Genome Data , 2011, ESORICS.

[16]  Hari Balakrishnan,et al.  CryptDB: protecting confidentiality with encrypted query processing , 2011, SOSP.

[17]  Véronique Martin,et al.  Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis , 2012, J. Comput. Biol..

[18]  Bo Peng,et al.  Large-Scale Privacy-Preserving Mapping of Human Genomic Sequences on Hybrid Clouds , 2012, NDSS.

[19]  S. Halevi,et al.  Design and Implementation of a Homomorphic-Encryption Library , 2012 .

[20]  Adam Molyneaux,et al.  Privacy-Preserving Processing of Raw Genomic Data , 2013, DPM/SETOP.

[21]  Nickolai Zeldovich,et al.  An Ideal-Security Protocol for Order-Preserving Encoding , 2013, 2013 IEEE Symposium on Security and Privacy.

[22]  Michael Naehrig,et al.  Private Computation on Encrypted Genomic Data , 2014, LATINCRYPT.

[23]  Z. Rilak,et al.  Keeping Genomic Data Safe on the Cloud , 2014 .

[24]  Cezar Plesca,et al.  Comparison-based computations over fully homomorphic encrypted data , 2014, 2014 10th International Conference on Communications (COMM).