K-anonymization incremental maintenance and optimization techniques

New privacy regulations together with ever increasing data availability and computational power have created a huge interest in data privacy research. One major research direction is built around k-anonymity property, which is required for the released data. Although many k-anonymization algorithms exist for static data, a complete framework to cope with data evolution (a real world scenario) has not been proposed before. In this paper, we introduce algorithms for the maintenance of k-anonymized versions of large evolving datasets. These algorithms incrementally manage insert/delete/update dataset modifications. Our results showed that incremental maintenance is very efficient compared with existing techniques and preserves data quality. The second main contribution of this paper is an optimization algorithm that is able to improve the quality of the solutions attained by either the non-incremental or incremental algorithms.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[6]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[7]  M. Villegas,et al.  Gramm–Leach–Bliley (GLB) Financial Services Modernization Act , 2001 .

[8]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[9]  Elisa Bertino,et al.  EFFICIENT K-ANONYMITY USING CLUSTERING TECHNIQUE , 2006 .

[10]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[11]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[12]  Josep Domingo-Ferrer,et al.  Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata , 2005, Data Mining and Knowledge Discovery.

[13]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[16]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[17]  Elisa Bertino,et al.  Secure Anonymization for Incremental Datasets , 2006, Secure Data Management.

[18]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[20]  Farshad Fotouhi,et al.  Privacy and confidentiality management for the microaggregation disclosure control method: disclosure risk and information loss measures , 2003, WPES '03.

[21]  Indrakshi Ray,et al.  A crossover operator for the k- anonymity problem , 2006, GECCO '06.