Utility-Constrained Electronic Health Record Data Publishing Through Generalization and Disassociation

Data containing diagnosis codes are often derived from electronic health records and shared to enable large-scale, low-cost medical studies. However, the sharing of such data may lead to the disclosure of patients’ identities, which must be prevented to address privacy concerns and comply with worldwide legislation. To ensure that data privacy and utility are preserved, a utility-constrained anonymization approach can be enforced. This approach transforms a given dataset, so that the probability of identity disclosure, based on diagnosis codes, is limited and the data remain useful for intended studies. In this chapter, we provide a detailed discussion of the utility-constrained anonymization approach. Specifically, we explain how utility constraints, which model the requirements of intended studies, can be formulated and satisfied through data generalization or disassociation. Furthermore, we review two recently proposed algorithms that follow the utility-constrained approach and are the current state-of-the-art in terms of preserving data utility. We conclude this chapter by discussing several promising directions for future research.

[1]  John Liagouris,et al.  Disassociation for electronic health record privacy , 2014, J. Biomed. Informatics.

[2]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Benjamin C. M. Fung,et al.  Privacy-preserving heterogeneous health data sharing , 2013, J. Am. Medical Informatics Assoc..

[4]  Cristina Nita-Rotaru,et al.  A survey of attack and defense techniques for reputation systems , 2009, CSUR.

[5]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[6]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[9]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[10]  Aris Gkoulalas-Divanis,et al.  Anonymization of Electronic Medical Records to Support Clinical Analysis , 2013, Springer Briefs in Electrical and Computer Engineering.

[11]  Charles Safran,et al.  Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[12]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[13]  Nikos Mamoulis,et al.  Privacy Preservation by Disassociation , 2012, Proc. VLDB Endow..

[14]  G. Loukides,et al.  Utility-Aware Anonymization of Diagnosis Codes , 2013, IEEE Journal of Biomedical and Health Informatics.

[15]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[16]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[17]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[18]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[19]  Aris Gkoulalas-Divanis,et al.  Anonymizing Transaction Data to Eliminate Sensitive Inferences , 2010, DEXA.

[20]  Nello Cristianini,et al.  MINI: Mining Informative Non-redundant Itemsets , 2007, PKDD.

[21]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[22]  Spiros Skiadopoulos,et al.  Anonymizing Data with Relational and Transaction Attributes , 2013, ECML/PKDD.

[23]  Bradley Malin,et al.  COAT: COnstraint-based anonymization of transactions , 2010, Knowledge and Information Systems.

[24]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[25]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[26]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[27]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[28]  Jimeng Sun,et al.  Publishing data from electronic health records while preserving privacy: A survey of algorithms , 2014, J. Biomed. Informatics.