Under Consideration for Publication in Knowledge and Information Systems Geometric Data Perturbation for Privacy Preserving Outsourced Data Mining

Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[3]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[4]  G. Stewart The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators , 1980 .

[5]  Wenliang Du,et al.  A hybrid multi-group approach for privacy-preserving data mining , 2009, Knowledge and Information Systems.

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Jianping Fan,et al.  A distributed approach to enabling privacy-preserving model-based classifier training , 2009, Knowledge and Information Systems.

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[12]  Xintao Wu,et al.  Deriving Private Information from Arbitrarily Projected Data , 2007, PAKDD.

[13]  Tiefeng Jiang,et al.  How many entries of a typical orthogonal matrix can be approximated by independent normals , 2006 .

[14]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[15]  Benny Pinkas,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[16]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[17]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[18]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[20]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[21]  J. Rössl Above the Clouds , 2012 .

[22]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Cristina Nita-Rotaru,et al.  A survey of attack and defense techniques for reputation systems , 2009, CSUR.

[24]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[25]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[26]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[27]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[28]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[29]  Ling Liu,et al.  A Random Rotation Perturbation Approach to Privacy Preserving Data Classification , 2005 .

[30]  Keke Chen,et al.  Towards Attack-Resilient Geometric Data Perturbation , 2007, SDM.

[31]  Lorenzo Sadun Applied Linear Algebra: The Decoupling Principle , 2000 .

[32]  Jean Gallier,et al.  Geometric Methods and Applications: For Computer Science and Engineering , 2000 .

[33]  Pedro Domingos KDD-2003 : proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2003, Washington, DC, USA , 2003 .

[34]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[35]  Yingjiu Li,et al.  Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining , 2008, Knowledge and Information Systems.

[36]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[37]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[38]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[39]  Kun Liu,et al.  An Attacker's View of Distance Preserving Maps for Privacy Preserving Data Mining , 2006, PKDD.

[40]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[41]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[42]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[43]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.