Abstracting Anonymization Techniques: A Prerequisite for Selecting a Generalization Algorithm

Abstract The recent buzz around open data highlighted the crucial problem of anonymization in the context of data publishing. Many research efforts were devoted to the definition of techniques performing such an anonymization. However the selection of the most relevant technique and the adequate algorithm is complex. Successful decision depends firstly on the ability of data publishers to understand the anonymization techniques and their associated algorithms. In this paper, we focus on the choice of an algorithm among the different ones implementing one of the anonymization techniques, namely generalization. Through an abstraction process presented in this paper, we provide data publishers with simplified descriptions for the generalization technique and its algorithms. These descriptions facilitate the understanding of the algorithms by data publishers having low programing skills. We present also some other use cases of these abstractions as well as an experimentation conducted to validate them.

[1]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[2]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[3]  Jeannette M. Wing Computational thinking and thinking about computing , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[4]  Sergey Vinogradov,et al.  Evaluation of Data Anonymization Tools , 2012, DBKDA 2012.

[5]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Supriya Borhade A Survey on Privacy Preserving Data Mining Techniques , 2015 .

[8]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[9]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[10]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Lokesh Patel,et al.  A Survey of Perturbation Technique For Privacy-Preserving of Data , 2013 .

[12]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[13]  Spiros Skiadopoulos,et al.  SECRETA: A System for Evaluating and Comparing RElational and Transaction Anonymization algorithms , 2014, EDBT.

[14]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Wei Tian,et al.  A Survey of Privacy Preserving Data Publishing using Generalization and Suppression , 2014 .

[16]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[17]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Isabelle Comyn-Wattiau,et al.  Characterizing Generalization Algorithms - First Guidelines for Data Publishers , 2014, KMIS.