A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms

Privacy preserving data mining has become increasingly popular because it allows sharing of privacy-sensitive data for analysis purposes. However, existing techniques such as random perturbation do not fare well for simple yet widely used and efficient Euclidean distance-based mining algorithms. Although original data distributions can be pretty accurately reconstructed from the perturbed data, distances between individual data points are not preserved, leading to poor accuracy for the distance-based mining methods. Besides, they do not generally focus on data reduction. Other studies on secure multi-party computation often concentrate on techniques useful to very specific mining algorithms and scenarios such that they require modification of the mining algorithms and are often difficult to generalize to other mining algorithms or scenarios. This paper proposes a novel generalized approach using the well-known energy compaction power of Fourier-related transforms to hide sensitive data values and to approximately preserve Euclidean distances in centralized and distributed scenarios to a great degree of accuracy. Three algorithms to select the most important transform coefficients are presented, one for a centralized database case, the second one for a horizontally partitioned, and the third one for a vertically partitioned database case. Experimental results demonstrate the effectiveness of the proposed approach.

[1]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[2]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[3]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[4]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[5]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[7]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[8]  David B. Lomet,et al.  Foundations of Data Organization and Algorithms , 1993, Lecture Notes in Computer Science.

[9]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[10]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[11]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[12]  J Beynon,et al.  A Student's Guide to Fourier Transforms, with applications in physics and engineering , 1996 .

[13]  Kaizhong Zhang,et al.  Evaluating a class of distance-mapping algorithms for data mining and clustering , 1999, KDD '99.

[14]  John G. Proakis,et al.  Digital Communications , 1983 .

[15]  Vasant Honavar,et al.  Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources , 2003 .

[16]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[17]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[18]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[19]  Ömer Egecioglu,et al.  Dimensionality reduction and similarity computation by inner-product approximations , 2000, IEEE Transactions on Knowledge and Data Engineering.

[20]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[21]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[22]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[23]  Lei Liu,et al.  Optimal randomization for privacy preserving data mining , 2004, KDD.

[24]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[25]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[26]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[27]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[28]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[29]  Kun Liu,et al.  Communication efficient construction of decision trees over heterogeneously distributed data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[30]  Stanley R. M. Oliveira,et al.  Privacy-Preserving Clustering by Object Similarity-Based Representation and Dimensionality Reduction Transformation , 2004 .

[31]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[32]  M. Atallah,et al.  Collaborative Research : ITR : Distributed Data Mining to Protect Information Privacy , 2004 .

[33]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[34]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[35]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[36]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[37]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[38]  Hillol Kargupta,et al.  A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments , 2004, IEEE Transactions on Knowledge and Data Engineering.

[39]  Chai Wah Wu,et al.  Privacy preserving data mining: a signal processing perspective and a simple data perturbation protocol , 2003 .

[40]  Vasant Honavar,et al.  Learning Decision Trees form Distributed Heterogeneous Autonomous Data , 2003, MAICS.

[41]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[42]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).