SecureMA: protecting participant privacy in genetic association meta-analysis

MOTIVATION Sharing genomic data is crucial to support scientific investigation such as genome-wide association studies. However, recent investigations suggest the privacy of the individual participants in these studies can be compromised, leading to serious concerns and consequences, such as overly restricted access to data. RESULTS We introduce a novel cryptographic strategy to securely perform meta-analysis for genetic association studies in large consortia. Our methodology is useful for supporting joint studies among disparate data sites, where privacy or confidentiality is of concern. We validate our method using three multisite association studies. Our research shows that genetic associations can be analyzed efficiently and accurately across substudy sites, without leaking information on individual participants and site-level association summaries. AVAILABILITY AND IMPLEMENTATION Our software for secure meta-analysis of genetic association studies, SecureMA, is publicly available at http://github.com/XieConnect/SecureMA. Our customized secure computation framework is also publicly available at http://github.com/XieConnect/CircuitService.

[1]  Joaquin Garcia-Alfaro,et al.  Data Privacy Management and Autonomous Spontaneous Security, 4th International Workshop, DPM 2009 and Second International Workshop, SETOP 2009, St. Malo, France, September 24-25, 2009, Revised Selected Papers , 2010, DPM/SETOP.

[2]  Murat Kantarcioglu,et al.  A secure distributed logistic regression protocol for the detection of rare adverse drug events , 2012, J. Am. Medical Informatics Assoc..

[3]  Patrick L. Taylor Personal Genomes: When consent gets in the way , 2008, Nature.

[4]  Zoltán Kutalik,et al.  Quality control and conduct of genome-wide association meta-analyses , 2014, Nature Protocols.

[5]  Robert J. Goodloe,et al.  Consistent Directions of Effect for Established Type 2 Diabetes Risk Variants Across Populations , 2012, Diabetes.

[6]  John P A Ioannidis,et al.  The power of meta-analysis in genome-wide association studies. , 2013, Annual review of genomics and human genetics.

[7]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[8]  C. Carlson,et al.  Genetic risk factors for body mass index and obesity in an ethnically diverse population: results from the Population Architecture using Genomics and Epidemiology (PAGE) Study , 2012, Obesity.

[9]  G. Church,et al.  From genetic privacy to open consent , 2008, Nature Reviews Genetics.

[10]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[11]  Adam Molyneaux,et al.  Privacy-Preserving Processing of Raw Genomic Data , 2013, DPM/SETOP.

[12]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[13]  M. Tobin,et al.  DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data , 2010, International journal of epidemiology.

[14]  Francis S. Collins,et al.  Identifiability in Genomic Research , 2007, Science.

[15]  Nora Cuppens-Boulahia,et al.  Data Privacy Management and Autonomous Spontaneous Security , 2014, Lecture Notes in Computer Science.

[16]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[17]  Zhen Lin,et al.  Genomic Research and Human Subject Privacy , 2004, Science.

[18]  M. Guyer,et al.  Charting a course for genomic medicine from base pairs to bedside , 2011, Nature.

[19]  Eric D Green,et al.  The Complexities of Genomic Identifiability , 2013, Science.

[20]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[21]  Dan M Roden,et al.  Data re-identification: societal safeguards. , 2013, Science.

[22]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[23]  Nicholas R. Anderson,et al.  Meeting the Governance Challenges of Next-Generation Biorepository Research , 2010, Science Translational Medicine.

[24]  Yehuda Lindell,et al.  More efficient oblivious transfer and extensions for faster secure computation , 2013, CCS.

[25]  Jean-Pierre Hubaux,et al.  Addressing the concerns of the lacks family: quantification of kin genomic privacy , 2013, CCS.

[26]  N. Cox,et al.  On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. , 2012, American journal of human genetics.

[27]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[28]  Dan Bogdanov,et al.  A new way to protect privacy in large-scale genome-wide association studies , 2013, Bioinform..

[29]  Xiao-Hua Zhou,et al.  Statistical Methods for Meta‐Analysis , 2008 .

[30]  Murat Kantarcioglu,et al.  A Cryptographic Approach to Securely Share and Query Genomic Sequences , 2008, IEEE Transactions on Information Technology in Biomedicine.

[31]  N. Hawkins,et al.  Data sharing in genomics — re-shaping scientific practice , 2009, Nature Reviews Genetics.

[32]  E. Zerhouni,et al.  Protecting Aggregate Genomic Data , 2008, Science.

[33]  Thomas Schneider,et al.  Faster secure two-party computation with less memory , 2013, ASIA CCS '13.