MAGE: A semantics retaining K-anonymization method for mixed data

K-anonymity is a fine approach to protecting privacy in the release of microdata for data mining. Microaggregation and generalization are two typical methods to implement k-anonymity. But both of them have some defects on anonymizing mixed microdata. To address the problem, we propose a novel anonymization method, named MAGE, which can retain more semantics than generalization and microaggregation in dealing with mixed microdata. The idea of MAGE is to combine the mean vector of numerical data with the generalization values of categorical data as a clustering centroid and to use it as incarnation of the tuples in the corresponding cluster. We also propose an efficient TSCKA algorithm to anonymize mixed data. Experimental results show that MAGE can anonymize mixed microdata effectively and the TSCKA algorithm can achieve better trade-off between data quality and algorithm efficiency comparing with two well-known anonymization algorithms, Incognito and KACA.

[1]  Alina Campan,et al.  K-anonymization incremental maintenance and optimization techniques , 2007, SAC '07.

[2]  Yufei Tao,et al.  Preservation of proximity privacy in publishing numerical sensitive data , 2008, SIGMOD Conference.

[3]  Dimitris Sacharidis,et al.  K-anonymity in the Presence of External Databases , 2022 .

[4]  Chris Clifton,et al.  Multirelational k-Anonymity , 2009, IEEE Trans. Knowl. Data Eng..

[5]  Chris Clifton,et al.  A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[6]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[8]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[9]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[11]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[12]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[13]  Ninghui Li,et al.  Towards optimal k-anonymization , 2008, Data Knowl. Eng..

[14]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[15]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[16]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[18]  Ninghui Li,et al.  Slicing: A New Approach for Privacy Preserving Data Publishing , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[20]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[22]  Cristina Nita-Rotaru,et al.  A survey of attack and defense techniques for reputation systems , 2009, CSUR.

[23]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[24]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[25]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[26]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[28]  Yufei Tao,et al.  ANGEL: Enhancing the Utility of Generalization for Privacy Preserving Publication , 2009, IEEE Transactions on Knowledge and Data Engineering.

[29]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[30]  Girish Agarwal,et al.  A Review On Data Anonymization Technique For Data Publishing , 2012 .

[31]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.

[32]  Javier Jiménez,et al.  An evolutionary approach to enhance data privacy , 2011, Soft Comput..

[33]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[34]  Elisa Bertino,et al.  Secure Anonymization for Incremental Datasets , 2006, Secure Data Management.

[35]  Huiqun Yu,et al.  An Efficient Microaggregation Algorithm for Mixed Data , 2008, 2008 International Conference on Computer Science and Software Engineering.