Achieving k-anonymity using Minimum Spanning Tree based Partitioning

Protecting individual‟s privacy has become a major concern among privacy research community. Many frameworks and privacy principles were proposed for protecting the privacy of the data that is being released to the public for mining purpose. k-anonymization was the most popular among the proposed techniques in which the sensitive association between the sensitive attributes and their corresponding identifiers are de-associated. In this paper, we proposed an enhanced k-anonymity technique by using Minimum Spanning Tree (MST) partitioning approach. In this technique we disclose the information of the individuals pertaining to minimum group size i.e., k. We achieve this technique in two phases. During the first phase, MST for the given dataset is partitioned to generate equivalence classes and in the subsequent phase whether the equivalence class size is achieved to that of the minimum group size k is verified. Our approach resulted in achieving the optimal anonymization along with data utility. We showed the efficacy of our proposed technique by running a series of experiments in terms of information loss to show that our technique adheres to the quality of the anonymized data.

[1]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Beng Chin Ooi,et al.  Privacy and ownership preserving of outsourced medical data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[3]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[4]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[5]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.

[6]  Ting Yu,et al.  Anonymizing bipartite graph data using safe groupings , 2008, Proc. VLDB Endow..

[7]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[8]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Raymond Chi-Wing Wong,et al.  Anonymization by Local Recoding in Data with Attribute Hierarchical Taxonomies , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[11]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[12]  Nikos Mamoulis,et al.  Non-homogeneous generalization in privacy preserving data publishing , 2010, SIGMOD Conference.

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[15]  David J. DeWitt,et al.  Multidimensional K-Anonymity , 2005 .

[16]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[17]  Hua Zhu,et al.  Achieving k -Anonymity Via a Density-Based Clustering Method , 2007, APWeb/WAIM.

[18]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[19]  Qing Zhang,et al.  Anonymizing bipartite graph data using safe groupings , 2008, The VLDB Journal.

[20]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[22]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[23]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  R. Prim Shortest connection networks and some generalizations , 1957 .

[27]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.