Case Study on Electronic Medical Records Data

This chapter presents a case study, which demonstrates that attackers can link GWAS-related patient data with external, identified data sources, based on diagnosis codes, to re-identify patients and infer their DNA sequences. Section 5.1 discusses the characteristics of the electronic medical record datasets that were used in the case study, while Sect. 5.2 demonstrates that a popular suppression-based strategy may not prevent the attack considered in Chapter 3 without producing excessively distorted data. This is contrast to the algorithms presented in Chap. 3, as it is explained in Sect. 5.3.

[1]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[2]  Khaled El Emam,et al.  Protecting privacy using k-anonymity. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[3]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[4]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[5]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[6]  Grigorios Loukides,et al.  Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[7]  Lucila Ohno-Machado,et al.  Hiding information by cell suppression , 2001, AMIA.

[8]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[10]  Aris Gkoulalas-Divanis,et al.  PCTA: privacy-constrained clustering-based transaction data anonymization , 2011, PAIS '11.

[11]  Peter Donnelly,et al.  Progress and challenges in genome-wide association studies in humans , 2008, Nature.

[12]  Julie A. Pavlin,et al.  Code-based Syndromic Surveillance for Influenzalike Illness by International Classification of Diseases, Ninth Revision , 2007, Emerging infectious diseases.

[13]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.