t-Closeness: Privacy Beyond k-Anonymity and l-Diversity

The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain "identifying" attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of l-diversity has been proposed to address this; l-diversity requires that each equivalence class has at least l well-represented values for each sensitive attribute. In this paper we show that l-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. We propose a novel privacy notion called t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We choose to use the earth mover distance measure for our t-closeness requirement. We discuss the rationale for t-closeness and illustrate its advantages through examples and experiments.

[1]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[2]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[3]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[4]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[5]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[6]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[7]  Ramayya Krishnan,et al.  Disclosure Limitation Methods and Information Loss for Tabular Data , 2001 .

[8]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[11]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[12]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[13]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[15]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[17]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[18]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[19]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[20]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.