Differential privacy under dependent tuples - the case of genomic privacy

MOTIVATION The rapid progress in genome sequencing has led to high availability of genomic data. However, due to growing privacy concerns about the participant's sensitive information, accessing results and data of genomic studies is restricted to only trusted individuals. On the other hand, paving the way to biomedical discoveries requires granting open access to genomic databases. Privacy-preserving mechanisms can be a solution for granting wider access to such data while protecting their owners. In particular, there has been growing interest in applying the concept of differential privacy (DP) while sharing summary statistics about genomic data. DP provides a mathematically rigorous approach but it does not consider the dependence between tuples in a database, which may degrade the privacy guarantees offered by the DP. RESULTS In this work, focusing on genomic databases, we show this drawback of DP and we propose techniques to mitigate it. First, using a real-world genomic dataset, we demonstrate the feasibility of an inference attack on differentially private query results by utilizing the correlations between the tuples in the dataset. The results show that the adversary can infer sensitive genomic data about a user from the differentially private query results by exploiting correlations between genomes of family members. Second, we propose a mechanism for privacy-preserving sharing of statistics from genomic datasets to attain privacy guarantees while taking into consideration the dependence between tuples. By evaluating our mechanism on different genomic datasets, we empirically demonstrate that our proposed mechanism can achieve up to 50% better privacy than traditional DP-based solutions. AVAILABILITY https://github.com/nourmadhoun/Differential-privacy-genomic-inference-attack. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Shibing Zhu,et al.  Achieving correlated differential privacy of big data publication , 2019, Comput. Secur..

[2]  Onur Mutlu,et al.  Shouji: a fast and efficient pre-alignment filter for sequence alignment , 2018, Bioinform..

[3]  Brian L. Browning,et al.  A one penny imputed genome from next generation reference panels , 2018, bioRxiv.

[4]  Mildred K Cho,et al.  Beyond Consent: Building Trusting Relationships With Diverse Populations in Precision Medicine Research , 2018, The American journal of bioethics : AJOB.

[5]  H. Vincent Poor,et al.  Dependent Differential Privacy for Correlated Data , 2017, 2017 IEEE Globecom Workshops (GC Wkshps).

[6]  Jean-Pierre Hubaux,et al.  Quantifying Interdependent Privacy Risks with Location Data , 2017, IEEE Transactions on Mobile Computing.

[7]  Masatoshi Yoshikawa,et al.  Quantifying Differential Privacy under Temporal Correlations , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[8]  Onur Mutlu,et al.  GateKeeper: a new hardware architecture for accelerating pre‐alignment in DNA short read mapping , 2016, Bioinform..

[9]  Yizhen Wang,et al.  Pufferfish Privacy Mechanisms for Correlated Data , 2016, SIGMOD Conference.

[10]  Melissa L McPheeters,et al.  A systematic literature review of individuals' perspectives on broad consent and data sharing in the United States , 2015, Genetics in Medicine.

[11]  D. Ledbetter,et al.  The Geisinger MyCode Community Health Initiative: an electronic health record-linked biobank for Precision Medicine research , 2015, Genetics in Medicine.

[12]  Erman Ayday,et al.  Can you Really Anonymize the Donors of Genomic Data in Today's Digital World? , 2015, DPM/QASA@ESORICS.

[13]  Hiroshi Nakagawa,et al.  Bayesian Differential Privacy on Correlated Data , 2015, SIGMOD Conference.

[14]  Tianqing Zhu,et al.  Correlated Differential Privacy: Hiding Information in Non-IID Data Set , 2015, IEEE Transactions on Information Forensics and Security.

[15]  Michael Y. Galperin,et al.  The 2015 Nucleic Acids Research Database Issue and Molecular Biology Database Collection , 2014, Nucleic Acids Res..

[16]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[17]  W. Eaton,et al.  Genetic research participation in a young adult community sample , 2014, Journal of Community Genetics.

[18]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[19]  Ashwin Machanavajjhala,et al.  Blowfish privacy: tuning privacy-utility trade-offs using policies , 2013, SIGMOD Conference.

[20]  Philip S. Yu,et al.  Correlated network data publication via differential privacy , 2013, The VLDB Journal.

[21]  Jean-Pierre Hubaux,et al.  Addressing the concerns of the lacks family: quantification of kin genomic privacy , 2013, CCS.

[22]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[23]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[24]  Manuel Corpas,et al.  Crowdsourcing the Corpasome , 2013, Source Code for Biology and Medicine.

[25]  Nikki M. Carroll,et al.  Biobanking for research: a survey of patient population attitudes and understanding , 2013, Journal of Community Genetics.

[26]  Ashwin Machanavajjhala,et al.  A rigorous and customizable framework for privacy , 2012, PODS.

[27]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[28]  Mohamed Ali Kâafar,et al.  You are what you like! Information leakage through users' Interests , 2012, NDSS.

[29]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[30]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[31]  Robert B. Hartlage,et al.  This PDF file includes: Materials and Methods , 2009 .

[32]  Shiro Ueda,et al.  Public involvement in pharmacogenomics research: a national survey on patients’ attitudes towards pharmacogenomics research and the willingness to donate DNA samples to a DNA bank in Japan , 2009, Cell and Tissue Banking.

[33]  Annelise E Barron,et al.  Advantages and limitations of next‐generation sequencing technologies: A comparison of electrophoresis and non‐electrophoresis methods , 2008, Electrophoresis.

[34]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[35]  A. Tamakoshi,et al.  Relationship between public attitudes toward genomic studies related to medicine and their level of genomic literacy in Japan , 2008, American journal of medical genetics. Part A.

[36]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[37]  Jill M. Pulley,et al.  Attitudes and perceptions of patients towards methods of establishing a DNA biobank , 2008, Cell and Tissue Banking.

[38]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[39]  C. Barnstable,et al.  HTRA1 Promoter Polymorphism in Wet Age-Related Macular Degeneration , 2006, Science.

[40]  Prateek Mittal,et al.  Dependence Makes You Vulnberable: Differential Privacy Under Dependent Tuples , 2016, NDSS.

[41]  Vicenç Torra,et al.  Data Privacy Management and Security Assurance , 2016, Lecture Notes in Computer Science.

[42]  N. Satoh,et al.  Public involvement in pharmacogenomics research: a national survey on public attitudes towards pharmacogenomics research and the willingness to donate DNA samples to a DNA bank in Japan , 2009, Cell and Tissue Banking.

[43]  M. Otlowski Essentially yours: the protection of human genetic information in Australia. , 2006, Genewatch : a bulletin of the Committee for Responsible Genetics.