An incremental privacy-preservation algorithm for the (k, e)-Anonymous model

Display Omitted An efficient algorithm is developed to prevent incremental privacy breach.Only the most recent previously-released data is required for privacy preservation.The solution can always be guaranteed the optimal result. An important issue to be addressed when data are to be published is data privacy. In this paper, the problem of data privacy based on a prominent privacy model, ( k , e ) -Anonymous, is addressed. Our scenario is that when a new dataset is to be released, there may be, at the same time, datasets that were released elsewhere. A problem arises because some attackers might obtain multiple versions of the same dataset and compare them with the newly released dataset. Although the privacy of all of the datasets has been well-preserved individually, such a comparison can lead to a privacy breach, which is a so-called "incremental privacy breach". To address this problem effectively, we first study the characteristics of the effects of multiple dataset releases with a theoretical approach. It has been found that a privacy breach that is subjected to an increment occurs when there is overlap between any parts of the new dataset with any parts of an existing dataset. Based on our proposed studies, a polynomial-time algorithm is proposed. This algorithm needs to consider only one previous version of the dataset, and it can also skip computing the overlapping partitions. Thus, the computational complexity of the proposed algorithm is reduced from O ( n m ) to only O ( pn 3 ) where p is the number of partitions, n is the number of tuples, and m is the number of released datasets. At the same time, the privacy of all of the released datasets as well as the optimal solution can be always guaranteed. In addition, experiment results that illustrate the efficiency of our algorithm on real-world datasets are presented.

[1]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[2]  Juggapong Natwichai,et al.  Incremental privacy preservation for associative classification , 2009, CIKM-PAVLAD.

[3]  Xue Li,et al.  Data Quality in Privacy Preservation for Associative Classification , 2008, ADMA.

[4]  Bin Jiang,et al.  Continuous privacy preserving publishing of data streams , 2009, EDBT '09.

[5]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[6]  Juggapong Natwichai,et al.  A Novel Heuristic Algorithm for Privacy Preserving of Associative Classification , 2008, PRICAI.

[7]  Alina Campan,et al.  K-anonymization incremental maintenance and optimization techniques , 2007, SAC '07.

[8]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[9]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Xue Li,et al.  Incremental processing and indexing for (k, e)-anonymisation , 2013, Int. J. Inf. Comput. Secur..

[11]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[12]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  G. S. Anandha Mala,et al.  An Intensified Approach for Privacy Preservation in Incremental Data Mining , 2012, ACITY.

[14]  Elisa Bertino,et al.  Privacy-preserving incremental data dissemination , 2009, J. Comput. Secur..

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[17]  Jinjun Chen,et al.  An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud , 2013, J. Comput. Syst. Sci..

[18]  Wei Zhao,et al.  A new scheme on privacy-preserving data classification , 2005, KDD '05.

[19]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[20]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.

[21]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[23]  Juggapong Natwichai,et al.  An Efficient Algorithm for Incremental Privacy Breach on (k, e)-Anonymous Model , 2013, 2013 16th International Conference on Network-Based Information Systems.

[24]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[25]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[26]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  Jian Pei,et al.  Anonymity for continuous data publishing , 2008, EDBT '08.

[28]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.