Privacy preservation in k-means clustering by cluster rotation

The use of clustering as a data analysis tool has raised concerns about the violation of individual privacy. This paper proposes a data perturbation technique for privacy preservation in k-means clustering. Data objects that have been partitioned into clusters using k-means clustering are perturbed by performing geometric transformations on the clusters in such a way that the object membership of each cluster and orientation of objects within a cluster remain the same. This geometric transformation is achieved through cluster rotation, i.e., every cluster is rotated about its own centroid. The clusters are first displaced away from the mean of the entire dataset so that no two clusters overlap after the subsequent cluster rotation. We analyze the privacy measure offered by this data perturbation technique and prove that a dataset perturbed by this method cannot be easily reverse engineered, yet is still relevant for cluster analysis.

[1]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[2]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[3]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[4]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[5]  D.V.L.N. Somayajulu,et al.  Privacy Preserving Clustering by Cluster Bulging for Information Sustenance , 2008, 2008 4th International Conference on Information and Automation for Sustainability.

[6]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[7]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[8]  Douglas M. Blough,et al.  Privacy preserving data obfuscation for inherently clustered data , 2008, Int. J. Inf. Comput. Secur..

[9]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[10]  Mohammed Ketel,et al.  Privacy-preserving mining by rotational data transformation , 2005, ACM-SE 43.

[11]  Elisa Bertino,et al.  A Survey of Quantification of Privacy Preserving Data Mining Algorithms , 2008, Privacy-Preserving Data Mining.

[12]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[13]  A. M. Natarajan,et al.  An Effective Data Transformation Approach for Privacy Preserving Clustering , 2008 .

[14]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[15]  Kenneth Falconer,et al.  Unsolved Problems In Geometry , 1991 .

[16]  Greg Hamerly,et al.  Alternatives to the k-means algorithm that find better clusterings , 2002, CIKM '02.

[17]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[18]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.