On the Privacy of Euclidean Distance Preserving Data Perturbation

We examine Euclidean distance preserving data perturbation as a tool for privacy-preserving data mining. Such perturbations allow many important data mining algorithms, with only minor modification, to be applied to the perturbed data and produce exactly the same results as if applied to the original data, e.g. hierarchical clustering and k-means clustering. However, the issue of how well the original data is hidden needs careful study. We take a step in this direction by assuming the role of an attacker armed with two types of prior information regarding the original data. We examine how well the attacker can recover the original data from the perturbed data and prior information. Our results offer insight into the vulnerabilities of Euclidean distance preserving transformations.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Chris Clifton,et al.  Privacy-Preserving Data Mining , 2006, Encyclopedia of Database Systems.

[3]  Stephen E. Fienberg,et al.  Random orthogonal matrix masking methodology for microdata release , 2008, Int. J. Inf. Comput. Secur..

[4]  William E. Winkler,et al.  Multiplicative Noise for Masking Continuous Data , 2001 .

[5]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[6]  Keke Chen,et al.  Towards Attack-Resilient Geometric Data Perturbation , 2007, SDM.

[7]  Stephen E. Fienberg,et al.  Additive noise and multiplicative bias as disclosure limitation techniques for continuous microdata: A simulation study , 2004, J. Comput. Methods Sci. Eng..

[8]  Aryya Gangopadhyay,et al.  A privacy preserving technique for distance-based classification with worst case privacy guarantees , 2008, Data Knowl. Eng..

[9]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[10]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[12]  Kun Liu,et al.  A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods , 2008, Privacy-Preserving Data Mining.

[13]  Kun Liu,et al.  An Attacker's View of Distance Preserving Maps for Privacy Preserving Data Mining , 2006, PKDD.

[14]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[15]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[16]  Xintao Wu,et al.  Deriving Private Information from Arbitrarily Projected Data , 2007, PAKDD.

[17]  Richard M. Heiberger,et al.  Generation of Random Orthogonal Matrices , 1978 .

[18]  Binwu He,et al.  Volume of unit ball in an n-dimensional normed space and its asymptotic properties , 2008 .

[19]  Sumit Sarkar,et al.  A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Rathindra Sarathy,et al.  Data Shuffling - A New Masking Approach for Numerical Data , 2006, Manag. Sci..

[21]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[22]  Yücel Saygin,et al.  Disclosure Risks of Distance Preserving Data Transformations , 2008, SSDBM.

[23]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[24]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[26]  Aryya Gangopadhyay,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, The VLDB Journal.

[27]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[28]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[29]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[30]  Yingjiu Li,et al.  Determining error bounds for spectral filtering based reconstruction methods in privacy preserving data mining , 2008, Knowledge and Information Systems.

[31]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[32]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[33]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[34]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[35]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[36]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.