Butterfly: Privacy Preserving Publishing on Multiple Quasi-Identifiers

Recently, privacy preserving data publishing has attracted significant interest in research. Most of the existing studies focus on only the situations where the data in question is published using one quasi-identifier. However, in a few important applications, a practical demand is to publish a data set on multiple quasi-identifiers for multiple users simultaneously, which poses several challenges. How can we generate one anonymized version of the data so that the privacy preservation requirement like k-anonymity is satisfied for all users? Moreover, how can we reduce the information loss as much as possible while the privacy preservation requirements are met? In this paper, we identify and tackle the novel problem of privacy preserving publishing on multiple quasi-identifiers. A naive solution of respectively publishing multiple versions for different quasi-identifiers unfortunately suffers from the possibility that those releases can be joined to intrude the privacy. Interestingly, we show that it is possible to generate only one anonymized table to satisfy the k-anonymity on all quasi-identifiers for all users without significant information loss. We systematically develop an effective method for privacy preserving publishing for multiple users, and report an empirical study using real data to verify the feasibility and the effectiveness of our method.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[3]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[4]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[5]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[6]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[7]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[8]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[10]  Charu C. Aggarwal,et al.  On privacy preservation against adversarial data mining , 2006, KDD '06.

[11]  Chris Clifton,et al.  Using Sample Size to Limit Exposure to Data Mining , 2000, J. Comput. Secur..

[12]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[13]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[15]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Elisa Bertino,et al.  An access control model supporting periodicity constraints and temporal reasoning , 1998, TODS.

[17]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[18]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[19]  Feng Zhu,et al.  On Multidimensional k-Anonymity with Local Recoding Generalization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Chris Clifton,et al.  A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[21]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[22]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[23]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[24]  Rajeev Motwani,et al.  Towards robustness in query auditing , 2006, VLDB.

[25]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[26]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[27]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[28]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[29]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Benjamin C. M. Fung,et al.  Anonymizing sequential releases , 2006, KDD '06.

[31]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[32]  Elisa Bertino,et al.  Secure and selective dissemination of XML documents , 2002, TSEC.

[33]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[35]  Philip S. Yu,et al.  Template-based privacy preservation in classification problems , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[36]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[37]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[38]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[39]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[40]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[41]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[42]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[43]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[44]  D. DeWitt,et al.  K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[45]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[46]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[47]  Ramakrishnan Srikant,et al.  Hippocratic Databases , 2002, VLDB.

[48]  Jörg Rothe,et al.  Some facets of complexity theory and cryptography: A five-lecture tutorial , 2001, CSUR.