Anonymization with Worst-Case Distribution-Based Background Knowledge

Background knowledge is an important factor in privacy preserving data publishing. Distribution-based background knowledge is one of the well studied background knowledge. However, to the best of our knowledge, there is no existing work considering the distribution-based background knowledge in the worst case scenario, by which we mean that the adversary has accurate knowledge about the distribution of sensitive values according to some tuple attributes. Considering this worst case scenario is essential because we cannot overlook any breaching possibility. In this paper, we propose an algorithm to anonymize dataset in order to protect individual privacy by considering this background knowledge. We prove that the anonymized datasets generated by our proposed algorithm protects individual privacy. Our empirical studies show that our method preserves high utility for the published data at the same time.

[1]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[6]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[7]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Charu C. Aggarwal,et al.  On privacy preservation against adversarial data mining , 2006, KDD '06.

[11]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[12]  Ninghui Li,et al.  Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[13]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[14]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[15]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[16]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[17]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[18]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[19]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[20]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[21]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.