Deriving private information from randomized data

Randomization has emerged as a useful technique for data disguising in privacy-preserving data mining. Its privacy properties have been studied in a number of papers. Kargupta et al. challenged the randomization schemes, and they pointed out that randomization might not be able to preserve privacy. However, it is still unclear what factors cause such a security breach, how they affect the privacy preserving property of the randomization, and what kinds of data have higher risk of disclosing their private contents even though they are randomized.We believe that the key factor is the correlations among attributes. We propose two data reconstruction methods that are based on data correlations. One method uses the Principal Component Analysis (PCA) technique, and the other method uses the Bayes Estimate (BE) technique. We have conducted theoretical and experimental analysis on the relationship between data correlations and the amount of private information that can be disclosed based our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed.To improve privacy, we propose a modified randomization scheme, in which we let the correlation of random noises "similar" to the original data. Our results have shown that the reconstruction accuracy of both PCA-based and BE-based schemes become worse as the similarity increases.

[1]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[2]  Shafi Goldwasser,et al.  Multi party computations: past and present , 1997, PODC '97.

[3]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[4]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[5]  Richard W. Hamming,et al.  Numerical methods for scientists and engineers (2nd ed.) , 1986 .

[6]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[7]  Richard Bronson Linear Algebra: An Introduction , 1995 .

[8]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  I. Jolliffe Principal Component Analysis , 2002 .

[10]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[11]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[12]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[13]  H. Vincent Poor,et al.  An introduction to signal detection and estimation (2nd ed.) , 1994 .

[14]  Ran Wolff,et al.  k-TTP: a new privacy model for large-scale distributed environments , 2004, KDD.

[15]  W. Härdle,et al.  Applied Multivariate Statistical Analysis , 2003 .

[16]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[17]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[18]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[19]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[20]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[21]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[22]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[23]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[24]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[25]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[26]  Richard S. Varga,et al.  Proof of Theorem 5 , 1983 .

[27]  Hillol Kargupta,et al.  Privacy-sensitive Bayesian network parameter learning , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[28]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.