Quantifying the costs and benefits of privacy-preserving health data publishing

Cost-benefit analysis is a prerequisite for making good business decisions. In the business environment, companies intend to make profit from maximizing information utility of published data while having an obligation to protect individual privacy. In this paper, we quantify the trade-off between privacy and data utility in health data publishing in terms of monetary value. We propose an analytical cost model that can help health information custodians (HICs) make better decisions about sharing person-specific health data with other parties. We examine relevant cost factors associated with the value of anonymized data and the possible damage cost due to potential privacy breaches. Our model guides an HIC to find the optimal value of publishing health data and could be utilized for both perturbative and non-perturbative anonymization techniques. We show that our approach can identify the optimal value for different privacy models, including K-anonymity, LKC-privacy, and ∊-differential privacy, under various anonymization algorithms and privacy parameters through extensive experiments on real-life data.

[1]  Kathryn A Phillips,et al.  Measuring preferences for health care interventions using conjoint analysis: an application to HIV testing. , 2002, Health services research.

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  Akimichi Takemura Local recoding by maximum weight matching for disclosure control of microdata sets , 1999 .

[4]  Abdulsalam Yassine,et al.  Privacy and the market for private data: A negotiation model to capitalize on private data , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[5]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[6]  David J. DeWitt,et al.  Workload-aware anonymization techniques for large-scale datasets , 2008, TODS.

[7]  A. Boardman,et al.  Cost-Benefit Analysis: Concepts and Practice , 1996 .

[8]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[9]  Thomas L. Saaty,et al.  DECISION MAKING WITH THE ANALYTIC HIERARCHY PROCESS , 2008 .

[10]  Richard Koch,et al.  The 80/20 Principle: The Secret of Achieving More With Less , 1998 .

[11]  Michal Sramka,et al.  A privacy attack that removes the majority of the noise from perturbed data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[12]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[13]  Raymond Chi-Wing Wong,et al.  Privacy-Preserving Data Publishing: An Overview , 2010, Privacy-Preserving Data Publishing: An Overview.

[14]  Benjamin C. M. Fung,et al.  Anonymizing healthcare data: a case study on the blood transfusion service , 2009, KDD.

[15]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[16]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[17]  Sharad Mehrotra,et al.  Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach , 2007, SDM.

[18]  A. Acquisti,et al.  Privacy Costs and Personal Data Protection: Economic and Legal Perspectives , 2009 .

[19]  Alessandro Acquisti,et al.  Is There a Cost to Privacy Breaches? An Event Study , 2006, WEIS.

[20]  Johannes Gehrke Programming with differential privacy , 2010, Commun. ACM.

[21]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[22]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[23]  Natalie Shlomo,et al.  Statistical Disclosure Control Methods Through a Risk-Utility Framework , 2006, Privacy in Statistical Databases.

[24]  Josep Domingo-Ferrer,et al.  On the Security of Microaggregation with Individual Ranking: Analytical Attacks , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[26]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[27]  Xiaoqian Jiang,et al.  SHARE: system design and case studies for statistical health information release , 2013, J. Am. Medical Informatics Assoc..

[28]  Bradley Malin,et al.  Towards Utility-driven Anonymization of Transactions , 2009 .

[29]  L. Willenborg,et al.  Optimal Local Suppression in Microdata , 1999 .

[30]  P. Capell,et al.  Decision-Making Techniques , 2008 .

[31]  Benjamin C. M. Fung,et al.  Anonymizing trajectory data for passenger flow analysis , 2014 .

[32]  Benjamin C. M. Fung,et al.  Preserving privacy and frequent sharing patterns for social network data publishing , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[33]  Javier Herranz,et al.  How to Group Attributes in Multivariate Microaggregation , 2008, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[34]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[35]  Philip S. Yu,et al.  Correlated network data publication via differential privacy , 2013, The VLDB Journal.

[36]  Jian Pei,et al.  The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks , 2011, Knowledge and Information Systems.

[37]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[38]  C. Skinner,et al.  Disclosure control for census microdata , 1994 .

[39]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[40]  Aris Gkoulalas-Divanis,et al.  Assessing Disclosure Risk and Data Utility Trade-off in Transaction Data Anonymization , 2012, Int. J. Softw. Informatics.

[41]  Benjamin C. M. Fung,et al.  Privacy-preserving heterogeneous health data sharing , 2013, J. Am. Medical Informatics Assoc..

[42]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[44]  Jean-Pierre Danthine,et al.  Intermediate Financial Theory , 2002 .

[45]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[46]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[47]  Benjamin C. M. Fung,et al.  Privacy protection for RFID data , 2009, SAC '09.

[48]  H. Koh,et al.  Data mining applications in healthcare. , 2005, Journal of healthcare information management : JHIM.

[49]  Daniel Kifer,et al.  Attacks on privacy and deFinetti's theorem , 2009, SIGMOD Conference.

[50]  Grigorios Loukides,et al.  Data utility and privacy protection trade-off in k-anonymisation , 2008, PAIS '08.

[51]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[52]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[53]  Elizabeth S. Chen,et al.  Attribute Utility Motivated k-anonymization of datasets to support the heterogeneous needs of biomedical researchers. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[54]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[55]  Fay Cobb Payton,et al.  Privacy of medical records: IT implications of HIPAA , 2000, CSOC.

[56]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[57]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[58]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[59]  N. Witzleb Monetary remedies for breach of confidence in privacy cases , 2007, Legal Studies.

[60]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[61]  Duminda Wijesekera,et al.  A comprehensive privacy-aware authorization framework founded on HIPAA privacy rules , 2010, IHI.

[62]  Rebecca Herold,et al.  The practical guide to HIPAA privacy and security compliance , 2003 .

[63]  Yi-Wen Fan,et al.  THE DECISION MAKING IN SELECTING ONLINE TRAVEL AGENCIES: AN APPLICATION OF ANALYTIC HIERARCHY PROCESS , 2009 .

[64]  Mário S. Alvim,et al.  Differential Privacy: On the Trade-Off between Utility and Information Leakage , 2011, Formal Aspects in Security and Trust.

[65]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[66]  L. Willenborg,et al.  Optimal loca lsuppression in microdata , 1998 .

[67]  Amihai Glazer,et al.  Price theory and applications : decisions, markets, and information , 2005 .

[68]  E. Elton Modern portfolio theory and investment analysis , 1981 .

[69]  Martin S. Olivier,et al.  On the use of economic price theory to find the optimum levels of privacy and information utility in non-perturbative microdata anonymisation , 2010, Data Knowl. Eng..

[70]  Alessandro Acquisti,et al.  Empirical Analysis of Data Breach Litigation , 2013, WEIS.

[71]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[72]  C. Hanson Healthcare Informatics , 2005 .

[73]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[74]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[75]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .