Preservation Of Patterns and Input-Output Privacy

Privacy preserving data mining so far has mainly focused on the data collector scenario where individuals supply their personal data to an untrusted collector in exchange for value. In this scenario, random perturbation has proved to be very successful. An equally compelling, but overlooked scenario, is that of a data custodian, which either owns the data or is explicitly entrusted with ensuring privacy of individual data. In this scenario, we show that it is possible to minimize disclosure while guaranteeing no outcome change. We conduct our investigation in the context of building a decision tree and propose transformations that preserve the exact decision tree. We show with a detailed set of experiments that they provide substantial protection to both input data privacy and mining output privacy.

[1]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[2]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[3]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[4]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[5]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[6]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[7]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[8]  Yücel Saygin,et al.  Privacy preserving association rule mining , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[9]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..