The Role of Quasi-identifiers in k-Anonymity Revisited

The concept of k-anonymity, used in the recent literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anonymization is usually rigorously validated by the authors, the definition of QI remains mostly informal, and different authors seem to have different interpretations of the concept of QI. The purpose of this paper is to provide a formal underpinning of QI and examine the correctness and incorrectness of various interpretations of QI in our formal framework. We observe that in cases where the concept has been used correctly, its application has been conservative; this note provides a formal understanding of the conservative nature in such cases.

[1]  Hamid Pirahesh,et al.  The Magic of Duplicates and Aggregates , 1990, VLDB.

[2]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[3]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[4]  Andreas Pfitzmann,et al.  Anonymity, Unobservability, and Pseudonymity - A Proposal for Terminology , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[5]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[6]  David Chaum,et al.  The dining cryptographers problem: Unconditional sender and recipient untraceability , 1988, Journal of Cryptology.

[7]  Philip S. Yu,et al.  Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.

[8]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[9]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[10]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[11]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Sushil Jajodia,et al.  Indistinguishability: The Other Aspect of Privacy , 2006, Secure Data Management.

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..