Deriving Private Information from Perturbed Data Using IQR Based Approach

Several randomized techniques have been proposed for privacy preserving data mining of continuous data. These approaches generally attempt to hide the sensitive data by randomly modifying the data values using some additive noise and aim to reconstruct the original distribution closely at an aggregate level. However, one challenge here is whether the reconstructed distribution can be exploited by attackers or snoopers to derive sensitive individual data. This paper presents one simple attack using Inter-Quantile Range on reconstructed distribution. The experimental results show that current random perturbation-based privacy preserving data mining techniques may need a careful scrutiny in order to prevent privacy breaches through this model based inference.

[1]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[2]  Gultekin Özsoyoglu,et al.  Auditing and Inference Control in Statistical Databases , 1982, IEEE Transactions on Software Engineering.

[3]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[4]  Ramayya Krishnan,et al.  Disclosure Limitation Methods and Information Loss for Tabular Data , 2001 .

[5]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[6]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[7]  Xintao Wu,et al.  On the use of spectral filtering for privacy preserving data mining , 2006, SAC '06.

[8]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[9]  Rathindra Sarathy,et al.  The Security of Confidential Numerical Data in Databases , 2002, Inf. Syst. Res..

[10]  Josep Domingo-Ferrer,et al.  On the Security of Noise Addition for Privacy in Statistical Databases , 2004, Privacy in Statistical Databases.

[11]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[12]  Josep Domingo-Ferrer,et al.  Current Directions in Statistical Data Protection , 2004 .

[13]  Chris Clifton,et al.  Defining Privacy for Data Mining , 2002 .

[14]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[15]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[16]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[17]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[18]  Lei Liu,et al.  Optimal randomization for privacy preserving data mining , 2004, KDD.