Generalized random rotation perturbation for vertically partitioned data sets

Random rotation is one of the common perturbation approaches for privacy preserving data classification, in which the data matrix is multiplied by a random rotation matrix before publishing in order to preserve data privacy. One distinct advantage of this approach is that it can maintain the geometric properties of the data matrix, so several categories of classifiers that are based on the geometric properties of the data can achieve similar accuracy on the transformed data as that on the original data. In this paper, we generalize this idea to the situation where the data matrix is assumed to be vertically partitioned into several sub-matrices and held by different owners. Each data holder can choose a rotation matrix randomly and independently to perturb their individual data. Then they all send the transformed data to a third party, who collects all of them and forms a whole data set for data mining or other analysis purposes. We show that under such a scheme the geometric properties of the data set is also preserved and thus it can maintain the accuracy of many classifiers and clustering techniques applied on the transformed data as on the original data. This method enables us to develop efficient centralized data mining algorithms instead of distributed algorithms to preserve privacy. Experiments on real data sets show that such generalization is effective for vertically partitioned data sets.