On the Generation of Privatized Synthetic Data Using Distance Transforms

Organizations have interest in research collaboration efforts that involve data sharing with peers. However, such partnerships often come with confidentiality risks that could involve insider attacks and untrustworthy collaborators who might leak sensitive information. To mitigate such data sharing vulnerabilities, entities share privatized data with retracted sensitive information. However, while such data sets might offer some assurances of privacy, maintaining the statistical traits of the original data, is often problematic, leading to poor data usability. Therefore, in this paper, a confidential synthetic data generation heuristic, that employs a combination of data privacy and distance transforms techniques, is presented. The heuristic is used for the generation of privatized numeric synthetic data, while preserving the statistical traits of the original data. Empirical results from applying unsupervised learning, using k-means, to test the usability of the privatized synthetic data set, are presented. Preliminary results from this implementation show that it might be possible to generate privatized synthetic data sets, with the same statistical morphological structure as the original, using data privacy and distance transforms methods. Keywords-Privatized synthetic data generation; Data privacy; Distance transforms; k-means clustering

[1]  Donald G. Bailey,et al.  An Efficient Euclidean Distance Transform , 2004, IWCIA.

[2]  Hideki Noda,et al.  Protection of privacy in JPEG files using reversible information hiding , 2012, 2012 International Symposium on Intelligent Signal Processing and Communications Systems.

[3]  P. Telagarapu,et al.  Audio authentication using Arnold and Discrete Cosine Transform , 2012, 2012 International Conference on Computing, Electronics and Electrical Technologies (ICCEET).

[4]  Oleg Chertov,et al.  Providing Group Anonymity Using Wavelet Transform , 2010, BNCOD.

[5]  Benoit M. Macq,et al.  Fast Euclidean Distance Transformation by Propagation Using Multiple Neighborhoods , 1999, Comput. Vis. Image Underst..

[6]  Kato Mivule An investigation of data privacy and utility using machine learning as a gauge , 2014 .

[7]  Aryya Gangopadhyay,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, The VLDB Journal.

[8]  A. ROSENFELD,et al.  Distance functions on digital pictures , 1968, Pattern Recognit..

[9]  Kato Mivule,et al.  Utilizing Noise Addition for Data Privacy, an Overview , 2013, ArXiv.

[10]  Kato Mivule,et al.  Applying Moving Average Filtering for Non-interactive Differential Privacy Settings , 2014, Complex Adaptive Systems.

[11]  M.A. Azgomi,et al.  A privacy preserving clustering technique using Haar wavelet transform and scaling data perturbation , 2008, 2008 International Conference on Innovations in Information Technology.

[12]  Donald G. Bailey Accelerating the distance transform , 2012, IVCNZ '12.