Incremental processing and indexing for (k, e)-anonymisation

The emerging of the internet-based services poses a privacy threat to the individuals. Data transformation to meet a privacy standard becomes a requirement for typical data processing for the services. k, e-anonymisation is one of the most promising data transformation approaches, since it can provide high-accuracy aggregate query results. Though, the computational cost of the algorithm providing optimal solutions for such approach is not very high, i.e., On². In certain environments, the data to be processed can be appended at any time. In this paper, we address an efficiency issue of the incremental privacy preservation using k, e-anonymisation approach. The impact of the increment is observed theoretically. We propose an incremental algorithm based on such observation. The algorithm can replace the quadratic-complexity processing by a linear function on some part of the dataset, while the optimal results are guaranteed. Additionally, a few indexes are proposed to further improve the efficiency of the proposed algorithm. The experiments have been conducted to validate our work. From the results, it can be seen that the proposed work is highly efficient comparing with the non-incremental algorithm and an approximation algorithm.

[1]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[2]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[3]  Josef Kittler,et al.  Privacy in Statistical Databases , 2012, Lecture Notes in Computer Science.

[4]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[5]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[6]  Jian Pei,et al.  Anonymity for continuous data publishing , 2008, EDBT '08.

[7]  Selim G. Akl,et al.  Views for Multilevel Database Security , 1987, IEEE Transactions on Software Engineering.

[8]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  Bin Jiang,et al.  Continuous privacy preserving publishing of data streams , 2009, EDBT '09.

[10]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Alina Campan,et al.  K-anonymization incremental maintenance and optimization techniques , 2007, SAC '07.

[12]  Jian Pei,et al.  Maintaining K-Anonymity against Incremental Updates , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[13]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[14]  Dr B Santhosh Kumar Santhosh Balan,et al.  Closeness : A New Privacy Measure for Data Publishing , 2022 .

[15]  Divesh Srivastava,et al.  Efficient Table Anonymization for Aggregate Query Answering , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Juggapong Natwichai,et al.  Incremental privacy preservation for associative classification , 2009, CIKM-PAVLAD.

[19]  Elisa Bertino,et al.  Privacy-preserving incremental data dissemination , 2009, J. Comput. Secur..

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Elisa Bertino,et al.  A unified framework for enforcing multiple access control policies , 1997, SIGMOD '97.

[22]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[23]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..