Preserving Patient Privacy When Sharing Same-Disease Data

Medical and health data are often collected for studying a specific disease. For such same-disease microdata, a privacy disclosure occurs as long as an individual is known to be in the microdata. Individuals in same-disease microdata are thus subject to higher disclosure risk than those in microdata with different diseases. This important problem has been overlooked in data-privacy research and practice, and no prior study has addressed this problem. In this study, we analyze the disclosure risk for the individuals in same-disease microdata and propose a new metric that is appropriate for measuring disclosure risk in this situation. An efficient algorithm is designed and implemented for anonymizing same-disease data to minimize the disclosure risk while keeping data utility as good as possible. An experimental study was conducted on real patient and population data. Experimental results show that traditional reidentification risk measures underestimate the actual disclosure risk for the individuals in same-disease microdata and demonstrate that the proposed approach is very effective in reducing the actual risk for same-disease data. This study suggests that privacy protection policy and practice for sharing medical and health data should consider not only the individuals’ identifying attributes but also the health and disease information contained in the data. It is recommended that data-sharing entities employ a statistical approach, instead of the HIPAA's Safe Harbor policy, when sharing same-disease microdata.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Stuart E. Madnick,et al.  Overview and Framework for Data and Information Quality Research , 2009, JDIQ.

[3]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[4]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[5]  Sumit Sarkar,et al.  Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data , 2011, Inf. Syst. Res..

[6]  Ram D. Gopal,et al.  Ur Scholarship Repository Management Faculty Publications Management Releasing Individually Identifiable Microdata with Privacy Protection against Stochastic Threat: an Application to Health Information Recommended Citation Releasing Individually Identifiable Microdata with Privacy Protection agains , 2022 .

[7]  Sumit Sarkar,et al.  Digression and Value Concatenation to Enable Privacy-Preserving Regression , 2014, MIS Q..

[8]  Sumit Sarkar,et al.  Class-Restricted Clustering and Microperturbation for Data Privacy , 2013, Manag. Sci..

[9]  Xiao-Bai Li A Bayesian Approach for Estimating and Replacing Missing Categorical Data , 2009, JDIQ.

[10]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Rebecca Herold,et al.  Standards for privacy of individually identifiable health information. Office of the Assistant Secretary for Planning and Evaluation, DHHS. Final rule. , 2001, Federal register.

[12]  H. Humphrey,et al.  Standards for privacy of individually identifiable health information. , 2003, Health care law monthly.

[13]  Walter C Willett,et al.  Coffee, caffeine, and risk of type 2 diabetes: a prospective cohort study in younger and middle-aged U.S. women. , 2006, Diabetes care.

[14]  N P Wray,et al.  Using the national registry of HIV-infected veterans in research: lessons for the development of disease registries. , 2001, Journal of Clinical Epidemiology.

[15]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[16]  Peter Christen,et al.  Challenges for privacy preservation in data integration , 2014, ACM J. Data Inf. Qual..

[17]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.

[18]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[19]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Hhs Office for Civil Rights Standards for privacy of individually identifiable health information. Final rule. , 2002, Federal register.

[21]  Jimeng Sun,et al.  Data and Analytics Challenges for a Learning Healthcare System , 2015, JDIQ.

[22]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[23]  Richard Y. Wang,et al.  Data Quality Assessment , 2002 .

[24]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[26]  Sumit Sarkar,et al.  Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining , 2009, Oper. Res..