Conclusions and Open Research Challenges

In this book, we have explained why EMR data need to be disseminated in a way that prevents patient re-identification. We have provided an overview of data sharing policies and regulations, which serve as a first line of defence but are unable to provide computational privacy guarantees, and then reviewed several anonymization approaches that can be used to prevent this threat. Specifically, we have surveyed anonymization principles and algorithms for demographics and diagnosis codes, which are high replicable, available, and distinguishable, and thus may lead to patient re-identification. Anonymity threats and methods for publishing patient information, contained in genomic data, have also been discussed.

[1]  Aris Gkoulalas-Divanis,et al.  Hiding sensitive knowledge without side effects , 2009, Knowledge and Information Systems.

[2]  Chris Clifton,et al.  Using Sample Size to Limit Exposure to Data Mining , 2000, J. Comput. Secur..

[3]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[4]  George V. Moustakides,et al.  A Max-Min Approach for Hiding Frequent Itemsets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[5]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[7]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[8]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[9]  Philip S. Yu,et al.  A border-based approach for hiding sensitive frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Jie Chen,et al.  Mining Unexpected Temporal Associations: Applications in Detecting Adverse Drug Reactions , 2008, IEEE Transactions on Information Technology in Biomedicine.

[11]  Hui Xiong,et al.  Privacy preservation for data cubes , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[13]  Bradley Malin,et al.  COAT: COnstraint-based anonymization of transactions , 2010, Knowledge and Information Systems.

[14]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[15]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[16]  K. Emam Methods for the de-identification of electronic health records for genomic research , 2011, Genome Medicine.

[17]  Aris Gkoulalas-Divanis,et al.  Revisiting sequential pattern hiding to enhance utility , 2011, KDD.

[18]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Aris Gkoulalas-Divanis,et al.  Exact Knowledge Hiding through Database Extension , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  Gautam Das,et al.  Privacy risks in health databases from aggregate disclosure , 2009, PETRA '09.

[21]  E. Clayton,et al.  Identifiability in biobanks: models, measures, and mitigation strategies , 2011, Human Genetics.

[22]  Osmar R. Zaïane,et al.  Protecting sensitive knowledge by data sanitization , 2003, Third IEEE International Conference on Data Mining.

[23]  Yanqing Ji,et al.  A Potential Causal Association Mining Algorithm for Screening Adverse Drug Reactions in Postmarketing Surveillance , 2011, IEEE Transactions on Information Technology in Biomedicine.

[24]  Bradley Malin,et al.  An Integrative Framework for Anonymizing Clinical and Genomic Data , 2010 .

[25]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[26]  Hui Xiong,et al.  A cubic-wise balance approach for privacy preservation in data cubes , 2006, Inf. Sci..

[27]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[28]  Aris Gkoulalas-Divanis,et al.  Anonymizing Transaction Data to Eliminate Sensitive Inferences , 2010, DEXA.

[29]  Rob Hall,et al.  Privacy-Preserving Record Linkage , 2010, Privacy in Statistical Databases.

[30]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[31]  Chris Clifton,et al.  δ-Presence without Complete World Knowledge , 2010, IEEE Transactions on Knowledge and Data Engineering.

[32]  Aris Gkoulalas-Divanis,et al.  A Survey of Association Rule Hiding Methods for Privacy , 2008, Privacy-Preserving Data Mining.

[33]  Maria E. Orlowska,et al.  Hiding Classification Rules for Data Sharing with Privacy Preservation , 2005, DaWaK.

[34]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[35]  Vagelis Hristidis,et al.  Information Discovery on Electronic Health Records , 2009 .