Improved Algorithms for Anonymization of Set-Valued Data

Data anonymization techniques enable publication of detailed information, while providing the privacy of sensitive information in the data against a variety of attacks. Anonymized data describes a set of possible worlds that include the original data. Generalization and suppression have been the most commonly used techniques for achieving anonymization. Some algorithms to protect privacy in the publication of set-valued data were developed by Terrovitis et al.,[16]. The concept of k-anonymity was introduced by Samarati and Sweeny [15], so that every tuple has at least (k-1) tuples identical with it. This concept was modified in [16] in order to introduce K m -anonymity, to limit the effects of the data dimensionality. This approach depends upon generalisation instead of suppression. To handle this problem two heuristic algorithms; namely the DA-algorithm and the AA-algorithm were developed by them.These alogorithms provide near optimal solutions in many cases.In this paper,we improve DA such that undesirable duplicates are not generated and using a FP-growth we display the anonymized data.We illustrate through suitable examples,the efficiency of our proposed algorithm.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[3]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[5]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[6]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[7]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[8]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[9]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[10]  B.K. Tripathy,et al.  A fast p-sensitive l-diversity Anonymisation algorithm , 2011, 2011 IEEE Recent Advances in Intelligent Computational Systems.

[11]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[13]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[14]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[16]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[18]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.