Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach

k-anonymity is a popular measure of privacy for data publishing: It measures the risk of identity-disclosure of individuals whose personal information are released in the form of published data for statistical analysis and data mining purposes(e.g. census data). Higher values of k denote higher level of privacy (smaller risk of disclosure). Existing techniques to achieve k-anonymity use a variety of “generalization” and “suppression” of cell values for multi-attribute data. At the same time, the released data needs to be as “information-rich” as possible to maximize its utility. Information loss becomes an even greater concern as more stringent privacy constraints are imposed [4]. The resulting optimization problems have proven to be computationally intensive for data sets with large attribute-domains. In this paper, we develop a systematic enumeration based branchand-bound technique that explores a much richer space of solutions than any previous method in literature. We further enhance the basic algorithm to incorporate heuristics that potentially accelerate the search process significantly.

[1]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[2]  William E. Winkler,et al.  Using Simulated Annealing for k-anonymity , 2002 .

[3]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Awi Federgruen,et al.  Structured Partitioning Problems , 1991, Oper. Res..

[5]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[7]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Piotr Berman,et al.  Exact Size of Binary Space Partitionings and Improved Rectangle Tiling Algorithms , 2002, SIAM J. Discret. Math..

[9]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Torsten Suel,et al.  Approximation algorithms for array partitioning problems , 2005, J. Algorithms.

[11]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[12]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  Daniel Kifer,et al.  How to quickly find a witness , 2003, PODS '03.

[14]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[16]  Torsten Suel,et al.  On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications , 1999, ICDT.

[17]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .