RECONSTRUCTION OF PERTURBED DATA USING K-MEANS

A key element in preserving privacy and confidentiality of sensitive data is the ability to evaluate the extent of all potential disclosure for such data. In other words, we need to be able to answer to what extent confidential information in a perturbed database can be compromised by attackers or snoopers. Several randomized techniques have been proposed for privacy preserving data mining of continuous data. These approaches generally attempt to hide the sensitive data by randomly modifying the data values using some additive noise and aim to reconstruct the original distribution closely at an aggregate level. The main contribution of this paper lies in the algorithm to accurately reconstruct the community joint density given the perturbed multidimensional stream data information. Any statistical question about the community can be answered using the reconstructed joint density. There have been many efforts on the community distribution reconstruction. Our research objective is to determine whether the distributions of the original and recovered data are close enough to each other despite the nature of the noise applied. We are considering an ensemble clustering method to reconstruct the initial data distribution. As the tool for the algorithm implementations we chose the “language of choice in industrial world” – MATLAB.

[1]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Keke Chen,et al.  Towards Attack-Resilient Geometric Data Perturbation , 2007, SDM.

[3]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[4]  Kun Liu,et al.  On the Privacy of Euclidean Distance Preserving Data Perturbation , 2009, ArXiv.

[5]  Durvasula V. L. N. Somayajulu,et al.  A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining , 2010, ArXiv.

[6]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[7]  Chunhua Su,et al.  Importance of Data Standardization in Privacy-Preserving K-Means Clustering , 2009, DASFAA Workshops.

[8]  Raffaele Giancarlo,et al.  Distance Functions, Clustering Algorithms and Microarray Data Analysis , 2010, LION.

[9]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Weijia Yang Privacy Protection by Matrix Transformation , 2009, IEICE Trans. Inf. Syst..

[11]  Zhang Yong,et al.  A Privacy-Preserving Data Publishing Algorithm for Clustering Application , 2010 .

[12]  Kun Liu,et al.  A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods , 2008, Privacy-Preserving Data Mining.