Anonymizing sequential releases

An organization makes a new release as new information become available, releases a tailored view for each data request, releases sensitive information and identifying information separately. The availability of related releases sharpens the identification of individuals by a global quasi-identifier consisting of attributes from related releases. Since it is not an option to anonymize previously released data, the current release must be anonymized to ensure that a global quasi-identifier is not effective for identification. In this paper, we study the sequential anonymization problem under this assumption. A key question is how to anonymize the current release so that it cannot be linked to previous releases yet remains useful for its own release purpose. We introduce the lossy join, a negative property in relational database design, as a way to hide the join relationship among releases, and propose a scalable and practical solution.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Chris Clifton,et al.  Using Sample Size to Limit Exposure to Data Mining , 2000, J. Comput. Secur..

[7]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[10]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[11]  Bradley Malin,et al.  How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems , 2004, J. Biomed. Informatics.

[12]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[13]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[14]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[15]  Alin Deutsch,et al.  Privacy in Database Publishing , 2005, ICDT.

[16]  Philip S. Yu,et al.  Template-based privacy preservation in classification problems , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[17]  Benjamin C. M. Fung,et al.  Integrating Private Databases for Data Analysis , 2005, ISI.

[18]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[20]  Sushil Jajodia,et al.  Checking for k-Anonymity Violation by Views , 2005, VLDB.

[21]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[24]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[25]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[26]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[27]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[28]  Philip S. Yu,et al.  Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.