Utility-Friendly Heterogenous Generalization in Privacy Preserving Data Publishing

K-anonymity is one of the most important anonymity models that have been widely investigated and various techniques have been proposed to achieve it. Among them generalization is a common technique. In a typical generalization approach, tuples in a table was first divided into many QI(quasi-identifier)-groups such that the size of each QI-group is larger than K. In general, utility of anonymized data can be enhanced if size of each QI-group is reduced. Motivated by this observation, we propose linking-based anonymity model, which achieves K-anonymity with QI-groups having size less than K. To implement linking-based anonymization model, we propose a simple yet efficient heuristic local recoding method. Extensive experiments on real data sets are also conducted to show that the utility has been significantly improved by our approach compared to the state-of-the-art methods.

[1]  D. DeWitt,et al.  K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[3]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[4]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[5]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[8]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[9]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Nikos Mamoulis,et al.  Non-homogeneous generalization in privacy preserving data publishing , 2010, SIGMOD Conference.

[11]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..