Two Noise Addition Methods For Privacy-Preserving Data Mining ∗

In the last decade, more and more researches have focused on privacy-preserving data min ing(PPDM). The previous work can be div ided into two categories: data modification and data encryption. Data encryption is not used as widely as data modificat ion because of its high cost on computing and communications. Data perturbation, including additive noise, mu lt iplicative noise, matrix mu ltip lication, data swapping, data shuffling, k-anonymization, Blocking, is an important technology in data modification method. PPDM has two targets: privacy and accuracy, and they are often at odds with each other. Th is paper begins with a proposal of two new noise addition methods for perturbing the original data, followed by a discussion of how they meet the two targets. Experiments show that the methods given in this paper have higher accuracy than existing ones under the same condition of privacy strength.

[1]  Durvasula V. L. N. Somayajulu,et al.  A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining , 2010, ArXiv.

[2]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[3]  Keke Chen,et al.  Towards Attack-Resilient Geometric Data Perturbation , 2007, SDM.

[4]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Yingjiu Li,et al.  On the Lower Bound of Reconstruction Error for Spectral Filtering Based Privacy Preserving Data Mining , 2006, PKDD.

[7]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[8]  Li Liu,et al.  Privacy Preserving Decision Tree Mining from Perturbed Data , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[9]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[10]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[11]  Liu Yi,et al.  A Novel Similarity Measure Framework on Financial Data Mining , 2010, 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing.

[12]  Rathindra Sarathy,et al.  Data Shuffling - A New Masking Approach for Numerical Data , 2006, Manag. Sci..