Preventing Re-identification While Supporting GWAS

This chapter discusses how clinical data can be published in a way that prevents re-identification attacks, while supporting the validation of Genome-Wide Association Studies (GWAS). After motivating the problem in Sect. 4.1, we provide an overview of an approach that deals with it [5, 9] in Sects. 4.2 and 4.3. This approach extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Teri A Manolio,et al.  Collaborative genome-wide association studies of diverse diseases: programs of the NHGRI's office of population genomics. , 2009, Pharmacogenomics.

[3]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[4]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[5]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[6]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[7]  Aris Gkoulalas-Divanis,et al.  PCTA: privacy-constrained clustering-based transaction data anonymization , 2011, PAIS '11.

[8]  Bradley Malin,et al.  COAT: COnstraint-based anonymization of transactions , 2010, Knowledge and Information Systems.

[9]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[10]  Chedy Raïssi,et al.  ρ-uncertainty , 2010, Proc. VLDB Endow..

[11]  Khaled El Emam,et al.  Protecting privacy using k-anonymity. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[12]  Lucila Ohno-Machado,et al.  Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance , 2002, J. Am. Medical Informatics Assoc..

[13]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[14]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[15]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[16]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[17]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[18]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[19]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.