Clustering Based K-anonymity Algorithm for Privacy Preservation

K-anonymity is an effective model for protecting privacy while publishing data, which can be implemented by different ways. Among them, local generalization are popular because of its low information loss. But such algorithms are generally computation expensive making it difficult to perform well in the case of large amount of data. In order to solve this problem, this paper proposes a clustering based K-anonymity algorithm and optimizes it with parallelization. The experimental result shows that the algorithm performs better in information loss and performance compared with the existing KACA and Incognito algorithms.

[1]  A. J. Hundepool ARGUS: SOFTWARE FOR STATISTICAL DISCLOSURE CONTROL OF MICRODATA 1 , 1995 .

[2]  Saeed Jalili,et al.  Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization , 2015, Data Mining and Knowledge Discovery.

[3]  Dong Li,et al.  Semi-Homogenous Generalization: Improving Homogenous Generalization for Privacy Preservation in Cloud Computing , 2016, Journal of Computer Science and Technology.

[4]  Zhiqiang Xie,et al.  The privacy preserving method for dynamic trajectory releasing based on adaptive clustering , 2017, Inf. Sci..

[5]  Latanya Sweeney,et al.  Computational disclosure control: a primer on data privacy protection , 2001 .

[6]  Anitha S. Pillai,et al.  Disclosure risk of individuals: A k-anonymity study on health care data related to Indian population , 2014, 2014 International Conference on Data Science & Engineering (ICDSE).

[7]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[8]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[9]  Chris Clifton,et al.  Defining Privacy for Data Mining , 2002 .

[10]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[11]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[13]  Manish Sharma,et al.  A Review Study on the Privacy Preserving Data Mining Techniques and Approaches , 2013 .

[14]  Hong Shen,et al.  Privacy-preserving data publishing for multiple numerical sensitive attributes , 2015 .

[15]  Kentaro Hayashi,et al.  Bottom-Up Cell Suppression that Preserves the Missing-at-random Condition , 2016, TrustBus.

[16]  K. Dhivya PRIVACY PRESERVING UPDATES USING GENERALIZATION-BASED AND SUPPRESSION-BASED K-ANONYMITY , 2014 .

[17]  Philip S. Yu,et al.  On static and dynamic methods for condensation-based privacy-preserving data mining , 2008, TODS.

[18]  Devesh C. Jinwala,et al.  Novel Approaches for Privacy Preserving Data Mining in k-Anonymity Model , 2016, J. Inf. Sci. Eng..

[19]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  Philip S. Yu,et al.  A framework for condensation-based anonymization of string data , 2008, Data Mining and Knowledge Discovery.

[21]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[22]  Saeed Jalili,et al.  Fast data-oriented microaggregation algorithm for large numerical datasets , 2014, Knowl. Based Syst..

[23]  Raymond Chi-Wing Wong,et al.  Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures , 2006, DaWaK.