论文信息 - t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute. In this paper we show that l-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We choose to use the earth mover distance measure for our t-closeness requirement. We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments.

[1] C. Givens,et al. A class of Wasserstein metrics for probability distributions. , 1984 .

[2] George T. Duncan,et al. Disclosure-Limited Data Dissemination , 1986 .

[3] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .

[4] Huaiyu Zhu. On Information and Sufficiency , 1997 .

[5] Pierangela Samarati,et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[6] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[7] Ramayya Krishnan,et al. Disclosure Limitation Methods and Information Loss for Tabular Data , 2001 .

[8] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9] Latanya Sweeney,et al. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.

[11] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[12] Adam Meyerson,et al. On the complexity of optimal K-anonymity , 2004, PODS.

[13] Roberto J. Bayardo,et al. Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14] David J. DeWitt,et al. Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[15] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16] Yufei Tao,et al. Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[17] Yufei Tao,et al. Anatomy: simple and effective privacy preservation , 2006, VLDB.

[18] Ashwin Machanavajjhala,et al. l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[19] Traian Marius Truta,et al. Protection : p-Sensitive k-Anonymity Property , 2006 .

[20] Qing Zhang,et al. Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.