(α, k)-anonymous data publishing

Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information. In this paper, we propose an (α, k)-anonymity model to protect both identifications and relationships to sensitive information in data. We discuss the properties of (α, k)-anonymity model. We prove that the optimal (α, k)-anonymity problem is NP-hard. We first present an optimal global-recoding method for the (α, k)-anonymity problem. Next we propose two scalable local-recoding algorithms which are both more scalable and result in less data distortion. The effectiveness and efficiency are shown by experiments. We also describe how the model can be extended to more general cases.

[1]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[2]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[3]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[5]  Ian Holyer,et al.  The NP-Completeness of Some Edge-Partition Problems , 1981, SIAM J. Comput..

[6]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[7]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[9]  Raymond Chi-Wing Wong,et al.  Privacy preserving serial data publishing by role composition , 2008, Proc. VLDB Endow..

[10]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[11]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.

[12]  Philip S. Yu,et al.  Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[15]  Ton de Waal,et al.  Introduction to Statistical Disclosure Control , 1996 .

[16]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[17]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[18]  Philip S. Yu,et al.  Template-based privacy preservation in classification problems , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[20]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[21]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[23]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[24]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[25]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[26]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[27]  Anco Hundepool The ARGUS Software in the CASC-Project , 2004, Privacy in Statistical Databases.

[28]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[29]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[30]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[31]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[32]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[33]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).