An Attacker's View of Distance Preserving Maps for Privacy Preserving Data Mining

We examine the effectiveness of distance preserving transformations in privacy preserving data mining. These techniques are potentially very useful in that some important data mining algorithms can be efficiently applied to the transformed data and produce exactly the same results as if applied to the original data e.g. distance-based clustering, k-nearest neighbor classification. However, the issue of how well the original data is hidden has, to our knowledge, not been carefully studied. We take a step in this direction by assuming the role of an attacker armed with two types of prior information regarding the original data. We examine how well the attacker can recover the original data from the transformed data and prior information. Our results offer insight into the vulnerabilities of distance preserving transformations.

[1]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[2]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[3]  William E. Winkler,et al.  Multiplicative Noise for Masking Continuous Data , 2001 .

[4]  GangopadhyayAryya,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, VLDB 2006.

[5]  Aryya Gangopadhyay,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, The VLDB Journal.

[6]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[10]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[11]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[12]  R. Tennant Algebra , 1941, Nature.

[13]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[14]  Gene Tsudik,et al.  A Privacy-Preserving Index for Range Queries , 2004, VLDB.

[15]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[16]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[17]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[18]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.