Reconsidering Anonymization-Related Concepts and the Term “Identification” Against the Backdrop of the European Legal Framework

Sharing data in biomedical contexts has become increasingly relevant, but privacy concerns set constraints for free sharing of individual-level data. Data protection law protects only data relating to an identifiable individual, whereas “anonymous” data are free to be used by everybody. Usage of many terms related to anonymization is often not consistent among different domains such as statistics and law. The crucial term “identification” seems especially hard to define, since its definition presupposes the existence of identifying characteristics, leading to some circularity. In this article, we present a discussion of important terms based on a legal perspective that it is outlined before we present issues related to the usage of terms such as unique “identifiers,” “quasi-identifiers,” and “sensitive attributes.” Based on these terms, we have tried to circumvent a circular definition for the term “identification” by making two decisions: first, deciding which (natural) identifier should stand for the individual; second, deciding how to recognize the individual. In addition, we provide an overview of anonymization techniques/methods for preventing re-identification. The discussion of basic notions related to anonymization shows that there is some work to be done in order to achieve a mutual understanding between legal and technical experts concerning some of these notions. Using a dialectical definition process in order to merge technical and legal perspectives on terms seems important for enhancing mutual understanding.

[1]  Bradley Malin,et al.  Anonymising and sharing individual patient data , 2015, BMJ : British Medical Journal.

[2]  Tamir Tassa,et al.  k-Anonymization with Minimal Loss of Information , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Khaled El Emam,et al.  Estimating the re-identification risk of clinical data sets , 2012, BMC Medical Informatics and Decision Making.

[4]  John Liagouris,et al.  Disassociation for electronic health record privacy , 2014, J. Biomed. Informatics.

[5]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Lucila Ohno-Machado,et al.  Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance , 2002, J. Am. Medical Informatics Assoc..

[7]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[8]  R. Ostrovsky,et al.  Identifying genetic relatives without compromising privacy , 2014, Genome research.

[9]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[10]  Joseph Gray Jackson,et al.  Privacy and Freedom , 1968 .

[11]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[12]  Burak Turhan,et al.  Sharing Data and Models in Software Engineering , 2014 .

[13]  Jimeng Sun,et al.  Publishing data from electronic health records while preserving privacy: A survey of algorithms , 2014, J. Biomed. Informatics.

[14]  Bradley Malin,et al.  Biomedical data privacy: problems, perspectives, and recent advances , 2013, J. Am. Medical Informatics Assoc..

[15]  Khaled El Emam,et al.  A critical appraisal of the Article 29 Working Party Opinion 05/2014 on data anonymization techniques , 2015 .

[16]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control: Hundepool/Statistical Disclosure Control , 2012 .

[17]  John Castellani Are clinical trial data shared sufficiently today? Yes , 2013, BMJ.

[18]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[19]  K. Kidd,et al.  Candidate SNPs for a universal individual identification panel , 2007, Human Genetics.

[20]  Jean-Pierre Hubaux,et al.  Privacy-Preserving Computation of Disease Risk by Using Genomic, Clinical, and Environmental Data , 2013, HealthTech.

[21]  Khaled El Emam,et al.  Protecting privacy using k-anonymity. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[22]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[23]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[24]  Ninghui Li,et al.  Slicing: A New Approach for Privacy Preserving Data Publishing , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  Slava Kisilevich,et al.  Efficient Multidimensional Suppression for K-Anonymity , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Xiaohong Su,et al.  Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices , 2012, Comput. Methods Programs Biomed..

[27]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[28]  Emiliano De Cristofaro,et al.  The Chills and Thrills of Whole Genome Sequencing , 2013, Computer.

[29]  Panos Kalnis,et al.  SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness , 2011, The VLDB Journal.

[30]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[31]  B Claerhout,et al.  Privacy protection for HealthGrid applications. , 2005, Methods of information in medicine.

[32]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[33]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.