The application of differential privacy to health data

Differential privacy has gained a lot of attention in recent years as a general model for the protection of personal information when used and disclosed for secondary purposes. It has also been proposed as an appropriate model for health data. In this paper we review the current literature on differential privacy and highlight important general limitations to the model and the proposed mechanisms. We then examine some practical challenges to the application of differential privacy to health data. The review concludes by identifying areas that researchers and practitioners in this area need to address to increase the adoption of differential privacy for health data.

[1]  Rathindra Sarathy,et al.  Does Differential Privacy Protect Terry Gross' Privacy? , 2010, Privacy in Statistical Databases.

[2]  Yufei Tao,et al.  Output perturbation with query relaxation , 2008, Proc. VLDB Endow..

[3]  Cynthia Dwork,et al.  An Ad Omnia Approach to Defining and Achieving Private Data Analysis , 2007, PinKDD.

[4]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.

[5]  Charles H. Heying,et al.  Survey results. , 2004, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[6]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[7]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[8]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[9]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[10]  Luk Arbuckle,et al.  El Emam Et Al.: the De‐identification of the Heritage Health Prize Claims Data Set Multimedia Appendix Multimedia Appendix 1 Truncation of Claims 2 Removal of High Risk Patients , 2022 .

[11]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[12]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[13]  Khaled El Emam,et al.  Model Formulation: Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk , 2009, J. Am. Medical Informatics Assoc..

[14]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[15]  David Buckeridge,et al.  Physician privacy concerns when disclosing patient data for public health purposes during a pandemic influenza outbreak , 2011, BMC public health.

[16]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[17]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[18]  Ashwin Machanavajjhala,et al.  Privacy in Search Logs , 2009, ArXiv.

[19]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[20]  David Johnston,et al.  Free lunch : how the wealthiest Americans enrich themselves at government expense (and stick you with the bill) , 2007 .

[21]  K. Emam,et al.  Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records. , 2009, The Canadian journal of hospital pharmacy.

[22]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[23]  Haim Kaplan,et al.  Private coresets , 2009, STOC '09.

[24]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[25]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  Khaled El Emam,et al.  De-identifying a public use microdata file from the Canadian national discharge abstract database , 2011, BMC Medical Informatics Decis. Mak..

[27]  Hua Wang,et al.  Cloning for privacy protection in multiple independent data publications , 2011, CIKM '11.

[28]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[29]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[30]  Ninghui Li,et al.  Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy , 2011, ArXiv.

[31]  Khaled El Emam,et al.  The Case for De-Identifying Personal Health Information , 2011 .

[32]  Alan F. Karr,et al.  Risk‐Utility Paradigms for Statistical Disclosure Limitation: How to Think, But Not How to Act , 2011 .

[33]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[34]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[35]  Lars Vilhuber,et al.  How Protective Are Synthetic Data? , 2008, Privacy in Statistical Databases.

[36]  Stephen E. Fienberg,et al.  A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries , 2008, Privacy-Preserving Data Mining.

[37]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[38]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[39]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[40]  Elisa Bertino,et al.  Private record matching using differential privacy , 2010, EDBT '10.

[41]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[42]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[43]  Stephen E. Fienberg,et al.  Differential Privacy and the Risk-Utility Tradeoff for Multi-dimensional Contingency Tables , 2010, Privacy in Statistical Databases.