Incremental privacy preservation for associative classification

Privacy preserving has become an essential process for any data mining task. Therefore, data transformation to ensure privacy preservation is needed. In this paper, we address a problem of privacy preserving on an incremental-data scenario in which the data need to be transformed are not static, but appended all the time. Our work is based on a well-known data privacy model, i.e. k-Anonymity. Meanwhile the data mining task to be applied to the given dataset is associative classification. As the problem of privacy preserving for data mining has proven as an NP-hard, we propose to study the characteristics of a proven heuristic algorithm in the incremental scenarios theoretically. Subsequently, we propose a few observations which lead to the techniques to reduce the computational complexity for the problem setting in which the outputs remains the same. In addition, we propose a simple algorithm, which is at most as efficient as the polynomial-time heuristic algorithm in the worst case, for the problem.

[1]  Benjamin C. M. Fung,et al.  Anonymizing sequential releases , 2006, KDD '06.

[2]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[3]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[6]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[9]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[11]  Alina Campan,et al.  K-anonymization incremental maintenance and optimization techniques , 2007, SAC '07.

[12]  Juggapong Natwichai,et al.  A Novel Heuristic Algorithm for Privacy Preserving of Associative Classification , 2008, PRICAI.

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.