论文信息 - Generalized random rotation perturbation for vertically partitioned data sets

Generalized random rotation perturbation for vertically partitioned data sets

Random rotation is one of the common perturbation approaches for privacy preserving data classification, in which the data matrix is multiplied by a random rotation matrix before publishing in order to preserve data privacy. One distinct advantage of this approach is that it can maintain the geometric properties of the data matrix, so several categories of classifiers that are based on the geometric properties of the data can achieve similar accuracy on the transformed data as that on the original data. In this paper, we generalize this idea to the situation where the data matrix is assumed to be vertically partitioned into several sub-matrices and held by different owners. Each data holder can choose a rotation matrix randomly and independently to perturb their individual data. Then they all send the transformed data to a third party, who collects all of them and forms a whole data set for data mining or other analysis purposes. We show that under such a scheme the geometric properties of the data set is also preserved and thus it can maintain the accuracy of many classifiers and clustering techniques applied on the transformed data as on the original data. This method enables us to develop efficient centralized data mining algorithms instead of distributed algorithms to preserve privacy. Experiments on real data sets show that such generalization is effective for vertically partitioned data sets.

[1] Chris Clifton,et al. Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[2] Thorsten Joachims,et al. Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[3] Jaideep Vaidya,et al. Knowledge and Information Systems , 2007 .

[4] Rakesh Agrawal,et al. Privacy-preserving data mining , 2000, SIGMOD 2000.

[5] Aryya Gangopadhyay,et al. A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, The VLDB Journal.

[6] Jie Wang,et al. Knowledge and Information Systems REGULAR PAPER , 2006 .

[7] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8] S. Ross. A First Course in Probability , 1977 .

[9] Keke Chen,et al. Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10] Ling Liu,et al. A Random Rotation Perturbation Approach to Privacy Preserving Data Classification , 2005 .

[11] Qi Wang,et al. Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[12] Wenliang Du,et al. Deriving private information from randomized data , 2005, SIGMOD '05.