Defending Against Membership Inference Attacks on Beacon Services

Large genomic datasets are now created through numerous activities, including recreational genealogical investigations, biomedical research, and clinical care. At the same time, genomic data has become valuable for reuse beyond their initial point of collection, but privacy concerns often hinder access. Over the past several years, Beacon services have emerged to broaden accessibility to such data. These services enable users to query for the presence of a particular minor allele in a private dataset, information that can help care providers determine if genomic variation is spurious or has some known clinical indication. However, various studies have shown that even this limited access model can leak if individuals are members in the underlying dataset. Several approaches for mitigating this vulnerability have been proposed, but they are limited in that they 1) typically rely on heuristics and 2) offer probabilistic privacy guarantees, but neglect utility. In this paper, we present a novel algorithmic framework to ensure privacy in a Beacon service setting with a minimal number of query response flips (e.g., changing a positive response to a negative). Specifically, we represent this problem as combinatorial optimization in both the batch setting (where queries arrive all at once), as well as the online setting (where queries arrive sequentially). The former setting has been the primary focus in prior literature, whereas real Beacons allow sequential queries, motivating the latter investigation. We present principled algorithms in this framework with both privacy and, in some cases, worst-case utility guarantees. Moreover, through an extensive experimental evaluation, we show that the proposed approaches significantly outperform the state of the art in terms of privacy and utility.

[1]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[2]  Murat Kantarcioglu,et al.  Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach. , 2017, American journal of human genetics.

[3]  Xiaoqian Jiang,et al.  Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks , 2017, J. Am. Medical Informatics Assoc..

[4]  Peter Slavík A Tight Analysis of the Greedy Algorithm for Set Cover , 1997, J. Algorithms.

[5]  Hyunghoon Cho,et al.  Privacy-preserving biomedical database queries with optimal privacy-utility trade-offs , 2020, bioRxiv.

[6]  Erman Ayday,et al.  Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons , 2020, Proc. Priv. Enhancing Technol..

[7]  C. Bustamante,et al.  Privacy Risks from Genomic Data-Sharing Beacons , 2015, American journal of human genetics.

[8]  Murat Kantarcioglu,et al.  Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services , 2017, BMC Medical Genomics.

[9]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[10]  Eric D Green,et al.  The Complexities of Genomic Identifiability , 2013, Science.

[11]  Yang Zhang,et al.  MBeacon: Privacy-Preserving Beacons for DNA Methylation Data , 2019, NDSS.

[12]  L. Ohno-Machado,et al.  Privacy challenges and research opportunities for genomic data sharing , 2020, Nature Genetics.

[13]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[14]  Erman Ayday,et al.  Re-Identification of Individuals in Genomic Data-Sharing Beacons via Allele Inference , 2017, bioRxiv.

[15]  B. Knoppers International ethics harmonization and the global alliance for genomics and health , 2014, Genome Medicine.

[16]  N. Lockhart,et al.  NCI think tank concerning the identifiability of biospecimens and “omic” data , 2013, Genetics in Medicine.

[17]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[18]  Seyed Yahya Anvar,et al.  Variations in the Genome: The Mutation Detection 2015 Meeting on Detection, Genome Sequencing, and Interpretation , 2016, Human mutation.

[19]  Md Momin Al Aziz,et al.  Aftermath of bustamante attack on genomic beacon service , 2017, BMC Medical Genomics.

[20]  Erman Ayday,et al.  The Effect of Kinship in Re-identification Attacks Against Genomic Data Sharing Beacons , 2020, bioRxiv.