Data privacy protection in multi-party clustering

Privacy concerns over sensitive data have become important in knowledge discovery. Usually, data owners have different levels of concerns over different data attributes, which adds complexity to data privacy. Moreover, collusion among malicious adversaries poses a severe threat to data security. In this paper, we present an efficient clustering method for distributed multi-party data sets using the orthogonal transformation and perturbation techniques. Our method allows data owners to apply different levels of privacy to different attributes. The miner, while receiving the perturbed data, can still obtain accurate clustering results. This method protects data privacy, not only in the semi-honest situation, but also in the presence of collusion. The accuracy of the mining results and the privacy levels, and their relations to the parameters in the method are analyzed. Moreover, we propose an improved version of the method to alleviate the problem with a large number of participants. Experimental results demonstrate the effectiveness of our method as compared to existing methods.

[1]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[2]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[3]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[4]  John F. Roddick,et al.  Detecting Privacy and Ethical Sensitivity in Data Mining Results , 2004, ACSC.

[5]  Oded Goldreich,et al.  Foundations of Cryptography: Basic Tools , 2000 .

[6]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[7]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[8]  Lefteris Angelis,et al.  Clustering classifiers for knowledge discovery from physically distributed databases , 2004, Data Knowl. Eng..

[9]  Alan F. Karr,et al.  Data Swapping: A Risk-Utility Framework and Web Service Implementation , 2003, DG.O.

[10]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[11]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Chris Clifton,et al.  Privacy-preserving data mining: why, how, and when , 2004, IEEE Security & Privacy Magazine.

[13]  Yun Chi,et al.  Mining association rules with non-uniform privacy concerns , 2004, DMKD '04.

[14]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[16]  A. Karr,et al.  Data swapping as a decision problem , 2005 .

[17]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[18]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[19]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[21]  G. Stewart The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators , 1980 .

[22]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[23]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[24]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  Philip J. Morrow,et al.  Knowledge discovery by probabilistic clustering of distributed databases , 2005, Data Knowl. Eng..