Membership Privacy in MicroRNA-based Studies

The continuous decrease in cost of molecular profiling tests is revolutionizing medical research and practice, but it also raises new privacy concerns. One of the first attacks against privacy of biological data, proposed by Homer et al. in 2008, showed that, by knowing parts of the genome of a given individual and summary statistics of a genome-based study, it is possible to detect if this individual participated in the study. Since then, a lot of work has been carried out to further study the theoretical limits and to counter the genome-based membership inference attack. However, genomic data are by no means the only or the most influential biological data threatening personal privacy. For instance, whereas the genome informs us about the risk of developing some diseases in the future, epigenetic biomarkers, such as microRNAs, are directly and deterministically affected by our health condition including most common severe diseases. In this paper, we show that the membership inference attack also threatens the privacy of individuals contributing their microRNA expressions to scientific studies. Our results on real and public microRNA expression data demonstrate that disease-specific datasets are especially prone to membership detection, offering a true-positive rate of up to 77% at a false-negative rate of less than 1%. We present two attacks: one relying on the L_1 distance and the other based on the likelihood-ratio test. We show that the likelihood-ratio test provides the highest adversarial success and we derive a theoretical limit on this success. In order to mitigate the membership inference, we propose and evaluate both a differentially private mechanism and a hiding mechanism. We also consider two types of adversarial prior knowledge for the differentially private mechanism and show that, for relatively large datasets, this mechanism can protect the privacy of participants in miRNA-based studies against strong adversaries without degrading the data utility too much. Based on our findings and given the current number of miRNAs, we recommend to only release summary statistics of datasets containing at least a couple of hundred individuals.

[1]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[2]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[3]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[4]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[5]  Zhicong Huang,et al.  Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies , 2015, CCS.

[6]  Sabine C. Mueller,et al.  miRNAs can be generally associated with human pathologies as exemplified for miR-144* , 2014, BMC Medicine.

[7]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[8]  Andrew P Feinberg,et al.  Epigenetics at the Crossroads of Genes and the Environment. , 2015, JAMA.

[9]  klaguia Epigenome data release: a participant-centered approach to privacy protection , 2016 .

[10]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[11]  C. Burge,et al.  Most mammalian mRNAs are conserved targets of microRNAs. , 2008, Genome research.

[12]  Ninghui Li,et al.  Membership privacy: a unifying framework for privacy definitions , 2013, CCS.

[13]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[14]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[15]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[16]  Sabine C. Mueller,et al.  A blood based 12-miRNA signature of Alzheimer disease patients , 2013, Genome Biology.

[17]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[18]  Yi Jing,et al.  Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs , 2015, Proceedings of the National Academy of Sciences.

[19]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[20]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[21]  Bo Peng,et al.  To Release or Not to Release: Evaluating Information Leaks in Aggregate Human-Genome Data , 2011, ESORICS.

[22]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[23]  G. Marchant,et al.  The ghost in our genes: legal and ethical implications of epigenetics. , 2009, Health matrix.

[24]  M. Mehler,et al.  Advances in Epigenetics and Epigenomics for Neurodegenerative Diseases , 2011, Current neurology and neuroscience reports.

[25]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[26]  Christina Backes,et al.  Toward the blood-borne miRNome of human diseases , 2011, Nature Methods.

[27]  Peter A. Jones,et al.  The Epigenomics of Cancer , 2007, Cell.

[28]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[29]  Stephen E. Fienberg,et al.  Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases , 2014, Privacy in Statistical Databases.

[30]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[31]  Andreas Keller,et al.  Privacy in Epigenetics: Temporal Linkability of MicroRNA Expression Profiles , 2016, USENIX Security Symposium.