Differential privacy preserving clustering in distributed datasets using Haar wavelet transform

The goal of privacy preserving clustering (PPC) is to preserve the privacy of data during clustering analysis. Most of the existing PPC algorithms are based on heuristic notions without provable privacy. Differential privacy is the strong notion of privacy introduced to overcome this problem. However, the lower degree of utility is the serious drawback of the techniques, which preserve differential privacy. In addition, high dimensionality of data is another drawback of the most existing PPC techniques, which leads to low efficiency of them. This paper proposes differential-based algorithms for PPC in horizontally and vertically distributed datasets. To overcome the above two drawbacks, we have used orthogonal discrete wavelet transforms (DWT) for obtaining perturbed data with both low data dimensionality and less noise addition. Our algorithms are implemented and experimented using some well-known datasets. The results show that the proposed algorithms guarantee an appropriate level of both utility and privacy of the published data.

[1]  Mohammad Abdollahi Azgomi,et al.  Differential privacy preserving clustering based on Haar wavelet transform , 2014, Intell. Data Anal..

[2]  Nina Mishra,et al.  Privacy via the Johnson-Lindenstrauss Transform , 2012, J. Priv. Confidentiality.

[3]  Yücel Saygin,et al.  Privacy Preserving Clustering on Horizontally Partitioned Data , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[4]  Benjamin C. M. Fung,et al.  Secure Distributed Framework for Achieving ε-Differential Privacy , 2012, Privacy Enhancing Technologies.

[5]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[6]  Matthias Klusch,et al.  Distributed Clustering Based on Sampling Local Density Estimates , 2003, IJCAI.

[7]  Yang Zhang,et al.  Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform , 2006, Informatica.

[8]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[9]  David Wai-Lok Cheung,et al.  Privacy-Preserving Clustering with High Accuracy and Low Time Complexity , 2009, DASFAA.

[10]  Ning Zhang,et al.  Distributed Data Mining with Differential Privacy , 2011, 2011 IEEE International Conference on Communications (ICC).

[11]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[12]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[13]  Charles K. Chui,et al.  An Introduction to Wavelets , 1992 .

[14]  Mohammad Abdollahi Azgomi,et al.  On the Use of Haar Wavelet Transform and Scaling Data perturbation for Privacy Preserving Clustering of Large Datasets , 2011, Int. J. Wavelets Multiresolution Inf. Process..

[15]  Moni Naor,et al.  Adaptively secure multi-party computation , 1996, STOC '96.

[16]  K. P. Soman,et al.  Insight into Wavelets: From Theory to Practice , 2005 .

[17]  Vladimir Estivill-Castro,et al.  Private representative-based clustering for vertically partitioned data , 2004, Proceedings of the Fifth Mexican International Conference in Computer Science, 2004. ENC 2004..

[18]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[19]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[20]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[21]  Mohammad Abdollahi Azgomi,et al.  A privacy preserving clustering technique for horizontally and vertically distributed datasets , 2011, Intell. Data Anal..

[22]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[23]  Kaizhong Zhang,et al.  Evaluating a class of distance-mapping algorithms for data mining and clustering , 1999, KDD '99.

[24]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[25]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[26]  Yücel Saygin,et al.  Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing , 2007, PAKDD Workshops.

[27]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[28]  Gaby G. Dagher,et al.  DARM: a privacy-preserving approach for distributed association rules mining on horizontally-partitioned data , 2014, IDEAS.

[29]  Haim Kaplan,et al.  Private coresets , 2009, STOC '09.

[30]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[31]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[32]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[33]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[34]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[35]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[36]  Rebecca N. Wright,et al.  A New Privacy-Preserving Distributed k-Clustering Algorithm , 2006, SDM.

[37]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[38]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[39]  Yanguang Shen,et al.  The Research of Privacy-Preserving Clustering Algorithm , 2010, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics.

[40]  Aryya Gangopadhyay,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, The VLDB Journal.

[41]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[42]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[43]  Osmar R. Zaïane,et al.  A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration , 2007, Comput. Secur..

[44]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[45]  Yücel Saygin,et al.  Privacy preserving clustering on horizontally partitioned data , 2007, Data Knowl. Eng..

[46]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[47]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.