Thoughts on k-Anonymization

k-Anonymity is a method for providing privacy protection by ensuring that data cannot be traced to an individual. In a k-anonymous dataset, any identifying information occurs in at least k tuples. To achieve optimal and practical k-anonymity, recently, many different kinds of algorithms with various assumptions and restrictions have been proposed with different metrics to measure quality. This paper presents the family of clustering based algorithms that are more flexible and even attempts to improve precision by ignoring the restrictions of user defined Domain Generalization Hierarchies. The main finding of the paper will be that metrics may behave differently through different algorithms and may not show correlations with some applications’ accuracy on output data.

[1]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[2]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[3]  Chris Clifton,et al.  Privacy-Preserving Distributed k-Anonymity , 2005, DBSec.

[4]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[6]  Chris Clifton,et al.  A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[7]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Lucila Ohno-Machado,et al.  Using Boolean reasoning to anonymize databases , 1999, Artif. Intell. Medicine.

[9]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[10]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[11]  Sushil Jajodia,et al.  Data and Applications Security XIX, 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Storrs, CT, USA, August 7-10, 2005, Proceedings , 2005, DBSec.

[12]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[13]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[14]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[15]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[16]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[17]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[18]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  F. BRUCE SANFORD,et al.  Information Explosion , 1970, Nature.

[21]  David J. DeWitt,et al.  Multidimensional K-Anonymity , 2005 .

[22]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[23]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[24]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[25]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[26]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[27]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.