Re-identification of Clinical Data Through Diagnosis Information

In this chapter, we present an attack that can associate patients with their diagnosis and genomic information. The attack involves linking the published data with external, identified datasets, based on diagnosis codes. After motivating the need to prevent the attack, we discuss the type of datasets that are involved in the attack, in Sect. 3.2. Then, a measure that quantifies the susceptibility of a dataset to the attack, as well as a study of the feasibility of the attack in an Electronic Medical Record (EMR) data publishing scenario, are presented in Sect. 3.3. Last, a set of measures that capture the utility loss that sharing the published data in a way that prevents the attack is discussed in Sect. 3.4.

[1]  Russ B. Altman,et al.  A call for the creation of personalized medicine databases , 2006, Nature Reviews Drug Discovery.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  K. Emam Methods for the de-identification of electronic health records for genomic research , 2011, Genome Medicine.

[4]  Amy L McGuire,et al.  Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: points to consider , 2008, Genetics in Medicine.

[5]  Bradley Malin,et al.  A computational model to protect patient data from location-based re-identification , 2007, Artif. Intell. Medicine.

[6]  Mark A. Rothstein,et al.  Ethical and legal implications of pharmacogenomics , 2001, Nature Reviews Genetics.

[7]  Virginia Barbour,et al.  UK Biobank: a project in search of a protocol? , 2003, The Lancet.

[8]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[9]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[10]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[11]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[12]  Zhen Lin,et al.  Using binning to maintain confidentiality of medical data , 2002, AMIA.

[13]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[14]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.