An Efficient Method for Securely Storing and Handling of Genomic Data

With the growth of cloud computing, genomic data is considered to be stored and processed on cloud platform. However, existing file formats to store genomic data does not guarantee the security in case of data leakage by hacker. In this paper, we therefore propose an encrypted version of the variant call format (VCF), which is one of the most widely used file formats to store genomic sequences. The encrypted variant call format (eVCF) supports a privacy preserving data processing on encrypted data and requires only few more seconds and data size than existing VCF.

[1]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[2]  Stephen T. Sherry,et al.  Assessing and managing risk when sharing aggregate genetic variant data , 2011, Nature Reviews Genetics.

[3]  Mete Akgün,et al.  Privacy preserving processing of genomic data: A survey , 2015, J. Biomed. Informatics.

[4]  Nuala A Sheehan,et al.  Participant identification in genetic association studies: improved methods and practical implications. , 2011, International journal of epidemiology.

[5]  Adam Molyneaux,et al.  Privacy-Preserving Processing of Raw Genomic Data , 2013, DPM/SETOP.

[6]  W. G. Hill,et al.  The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis , 2009, PLoS genetics.

[7]  Yaniv Erlich,et al.  DNA Compass: a secure, client-side site for navigating personal genetic information , 2016 .

[8]  Jinghui Zhang,et al.  Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data , 2009, PLoS genetics.

[9]  Michael Krawczak,et al.  GrabBlur - a framework to facilitate the secure exchange of whole-exome and -genome SNV data using VCF files , 2014, BMC Genomics.

[10]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[11]  Helger Lipmaa,et al.  Comments to NIST concerning AES Modes of Operations: CTR-Mode Encryption , 2000 .

[12]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[13]  N. Cox,et al.  On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. , 2012, American journal of human genetics.

[14]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[15]  Itsik Pe'er,et al.  MetaSeq: Privacy Preserving Meta-Analysis of Sequencing-Based Association Studies , 2012, Pacific Symposium on Biocomputing.

[16]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[17]  Zhou Li,et al.  Privacy-preserving genomic computation through program specialization , 2009, CCS.

[18]  S. Chanock,et al.  A new statistic and its power to infer membership and phenotype in a genome-wide association study using genotype frequencies , 2009, Nature Genetics.

[19]  William Stallings,et al.  THE ADVANCED ENCRYPTION STANDARD , 2002, Cryptologia.