论文信息 - Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach

Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach

k-anonymity is a popular measure of privacy for data publishing: It measures the risk of identity-disclosure of individuals whose personal information are released in the form of published data for statistical analysis and data mining purposes(e.g. census data). Higher values of k denote higher level of privacy (smaller risk of disclosure). Existing techniques to achieve k-anonymity use a variety of “generalization” and “suppression” of cell values for multi-attribute data. At the same time, the released data needs to be as “information-rich” as possible to maximize its utility. Information loss becomes an even greater concern as more stringent privacy constraints are imposed [4]. The resulting optimization problems have proven to be computationally intensive for data sets with large attribute-domains. In this paper, we develop a systematic enumeration based branchand-bound technique that explores a much richer space of solutions than any previous method in literature. We further enhance the basic algorithm to incorporate heuristics that potentially accelerate the search process significantly.

[1] David J. DeWitt,et al. Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[2] William E. Winkler,et al. Using Simulated Annealing for k-anonymity , 2002 .

[3] Roberto J. Bayardo,et al. Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4] Awi Federgruen,et al. Structured Partitioning Problems , 1991, Oper. Res..

[5] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[6] Pierangela Samarati,et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[7] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8] Piotr Berman,et al. Exact Size of Binary Space Partitionings and Improved Rectangle Tiling Algorithms , 2002, SIAM J. Discret. Math..

[9] Latanya Sweeney,et al. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10] Torsten Suel,et al. Approximation algorithms for array partitioning problems , 2005, J. Algorithms.

[11] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.

[12] Philip S. Yu,et al. Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13] Daniel Kifer,et al. How to quickly find a witness , 2003, PODS '03.

[14] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15] Philip S. Yu,et al. A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[16] Torsten Suel,et al. On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications , 1999, ICDT.

[17] L. Willenborg,et al. Elements of Statistical Disclosure Control , 2000 .