Analysis of privacy preserving random perturbation techniques: further explorations

Privacy is becoming an increasingly important issue in many data mining applications, particularly in the security and defense area. This has triggered the development of many privacy-preserving data mining techniques. A large fraction of them uses randomized data distortion techniques to mask the data for preserving the privacy. They attempt to hide the sensitive data by randomly modifying the data values using additive noise. This paper questions the utility of such randomized data distortion technique for preserving privacy in many cases and urges caution. It notes that random objects (particularly random matrices) have "predictable" structures in the spectral domain and then offers a random matrix-based spectral filtering technique to retrieve original data from the data-set distorted by adding random values. It extends our earlier work questioning the efficacy of random perturbation techniques using additive noise for privacy-preserving data mining in continuous valued domain and presents new results in the discrete domain. It shows that the growing collection of random perturbation-based "privacy-preserving" data mining techniques may need a careful scrutiny in order to prevent privacy breaches through linear transformations. The paper also presents extensive experimental results in order to support this claim.

[1]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[2]  Ted E. Senator,et al.  Restructuring Databases for Knowledge Discovery by Consolidation and Link Formation , 1995, KDD.

[3]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[5]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[6]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[7]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[8]  Alexandre V. Evfimievski,et al.  Randomization in privacy preserving data mining , 2002, SKDD.

[9]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[10]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[11]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[12]  Arnold D. Well,et al.  Many Faces of the Correlation Coefficient , 1997 .

[13]  Malcolm K. Sparrow,et al.  The application of network analysis to criminal intelligence: An assessment of the prospects , 1991 .

[14]  L. Parra,et al.  An Introduction to Independent Component Analysis and Blind Source Separation , 2001 .

[15]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.