An Anonymization Method to Improve Data Utility for Classification

k-anonymity is a popular method to preserve privacy in microdata, which sacrifices data utility for preserving individuals’ privacy. Therefore, how to preserve privacy with high data utility has been becoming a hot topic in k-anonymity area. Existing anonymization methods seldomly consider the data utility for specific data mining. To address the problem, we define a novel attribute weight measurement for determining the generalization order, and further propose a new anonymization algorithm based on the weight measurement using global generalization, called Weighted Full-Domain Anonymization (WFDA) Algorithm. The main idea of the algorithm is to generalize attributes with large weights to lower levels, and attributes with small weights to high levels. The proposed algorithm can reserve data utility for classification to a large extent. Experiments show that anonymous data resulted from the proposed method retains higher utility, i.e., has better classification accuracy, than that generated by other anonymization methods.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Ming Yang,et al.  Anonymizing 1: M microdata with high utility , 2017, Knowl. Based Syst..

[4]  David J. DeWitt,et al.  Workload-aware anonymization techniques for large-scale datasets , 2008, TODS.

[5]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Jin Wang,et al.  An improved anonymity model for big data security based on clustering algorithm , 2017, Concurr. Comput. Pract. Exp..

[9]  Jun-jie Jia,et al.  Personalized sensitive attribute anonymity based on P - sensitive k anonymity , 2016, ICIIP.

[10]  Raymond Chi-Wing Wong,et al.  Information based data anonymization for classification utility , 2011, Data Knowl. Eng..

[11]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[12]  I-Hsien Ting,et al.  Privacy and Utility Effects of k-anonymity on Association Rule Hiding , 2016, MISNC.

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[15]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Slava Kisilevich,et al.  Efficient Multidimensional Suppression for K-Anonymity , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[19]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[21]  Marco Fiore,et al.  $k^{\tau,\epsilon}$-anonymity: Towards Privacy-Preserving Publishing of Spatiotemporal Trajectory Data , 2017, 1701.02243.