OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining

The randomized response (RR) technique is a promising technique to disguise private categorical data in privacy-preserving data mining (PPDM). Although a number of RR-based methods have been proposed for various data mining computations, no study has systematically compared them to find optimal RR schemes. The difficulty of comparison lies in the fact that to compare two PPDM schemes, one needs to consider two conflicting metrics: privacy and utility. An optimal scheme based on one metric is usually the worst based on the other metric. In this paper, we first describe a method to quantify privacy and utility. We formulate the quantification as estimate problems, and use estimate theories to derive quantification. We then use an evolutionary multi-objective optimization method to find optimal disguise matrices for the randomized response technique. The experimental results have shown that our scheme has a much better performance than the existing RR schemes.

[1]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[2]  Pedro Domingos KDD-2003 : proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2003, Washington, DC, USA , 2003 .

[3]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[4]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[5]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[6]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[7]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[8]  Charles Gide,et al.  Cours d'économie politique , 1911 .

[9]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[10]  Jayant R. Haritsa,et al.  On Addressing Efficiency Concerns in Privacy-Preserving Mining , 2003, DASFAA.

[11]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[12]  J. Dennis,et al.  A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems , 1997 .

[13]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[14]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[15]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.