Preserving Individual Privacy in Serial Data Publishing

While previous works on privacy-preserving serial data publishing consider the scenario where sensitive values may persist over multiple data releases, we find that no previous work has sufficient protection provided for sensitive values that can change over time, which should be the more common case. In this work we propose to study the privacy guarantee for such transient sensitive values, which we call the global guarantee. We formally define the problem for achieving this guarantee and derive some theoretical properties for this problem. We show that the anonymized group sizes used in the data anonymization is a key factor in protecting individual privacy in serial publication. We propose two strategies for anonymization targeting at minimizing the average group size and the maximum group size. Finally, we conduct experiments on a medical dataset to show that our method is highly efficient and also produces published data of very high utility.

[1]  Jian Pei,et al.  Maintaining K-Anonymity against Incremental Updates , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[2]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[3]  Beng Chin Ooi,et al.  Privacy and ownership preserving of outsourced medical data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[7]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[8]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[9]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[12]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[13]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[14]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[15]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Feng Zhu,et al.  On Multidimensional k-Anonymity with Local Recoding Generalization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Benjamin C. M. Fung,et al.  Anonymizing sequential releases , 2006, KDD '06.

[19]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[20]  Elisa Bertino,et al.  Secure Anonymization for Incremental Datasets , 2006, Secure Data Management.

[21]  Jian Pei,et al.  Anonymity for continuous data publishing , 2008, EDBT '08.

[22]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[23]  Yufei Tao,et al.  On Anti-Corruption Privacy Preserving Publication , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[25]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[26]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[27]  Philip S. Yu,et al.  Template-based privacy preservation in classification problems , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[28]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[29]  Raymond Chi-Wing Wong,et al.  Privacy preserving serial data publishing by role composition , 2008, Proc. VLDB Endow..

[30]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[31]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[32]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[33]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).