Anonymization of Data Sets with NULL Values

Releasing, publishing or transferring microdata is restricted by the necessity to protect the privacy of data owners. k-anonymity is one of the most widespread concepts for anonymizing microdata but it does not explicitly cover NULL values which are nevertheless frequently found in microdata. We study the problem of NULL values missing values, non-applicable attributes, etc. for anonymization in detail, present a set of new definitions for k-anonymity explicitly considering NULL values and analyze which definition protects from which attacks. We show that an adequate treatment of missing values in microdata can be easily achieved by an extension of generalization algorithms. In particular, we show how the proposed treatment of NULL values was incorporated in the anonymization tool ANON, which implements generalization and tuple suppression with an application specific definition of information loss. With a series of experiments we show that NULL aware generalization algorithms have less information loss than standard algorithms.

[1]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[2]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[3]  Weining Zhang,et al.  Extending l-diversity to generalize sensitive data , 2011, Data Knowl. Eng..

[4]  Johann Eder,et al.  IT Solutions for Privacy Protection in Biobanking , 2012, Public Health Genomics.

[5]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[6]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  Johann Eder,et al.  Priority-Based k-Anonymity Accomplished by Weighted Generalisation Structures , 2006, DaWaK.

[8]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.

[9]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[10]  M. Waldenberger,et al.  Comprehensive catalog of European biobanks , 2011, Nature Biotechnology.

[11]  Johann Eder,et al.  Information Systems for Federated Biobanks , 2009, Trans. Large Scale Data Knowl. Centered Syst..

[12]  Lucila Ohno-Machado,et al.  Using Boolean reasoning to anonymize databases , 1999, Artif. Intell. Medicine.

[13]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[14]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[15]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[16]  Johann Eder,et al.  ANON-a flexible tool for achieving optimal k-anonymous and l-diverse tables , 2014 .

[17]  Johann Eder,et al.  Achieving k-anonymity in DataMarts used for gene expressions exploitation , 2007, J. Integr. Bioinform..

[18]  A Berghold,et al.  The Genome Austria Tissue Bank (GATiB) , 2007, Pathobiology.

[19]  George Gaskell,et al.  Publics and biobanks: Pan-European diversity and the challenge of responsible innovation , 2012, European Journal of Human Genetics.

[20]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[21]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[22]  Hua Wang,et al.  Enhanced P-Sensitive K-Anonymity Models for Privacy Preserving Data Publishing , 2008, Trans. Data Priv..

[23]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[24]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[25]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[26]  Johann Eder,et al.  k-Anonymity of Microdata with NULL Values , 2014, DEXA.

[27]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[28]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[29]  Ofer Harel,et al.  Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy , 2011 .

[30]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).