论文信息 - Privacy-Preserving Decision Tree Mining Based on Random Substitutions

Privacy-Preserving Decision Tree Mining Based on Random Substitutions

Privacy-preserving decision tree mining is an important problem that has yet to be thoroughly understood. In fact, the privacy-preserving decision tree mining method explored in the pioneer paper [1] was recently showed to be completely broken, because its data perturbation technique is fundamentally flawed [2]. However, since the general framework presented in [1] has some nice and useful features in practice, it is natural to ask if it is possible to rescue the framework by, say, utilizing a different data perturbation technique. In this paper, we answer this question affirmatively by presenting such a data perturbation technique based on random substitutions. We show that the resulting privacy-preserving decision tree mining method is immune to attacks (including the one introduced in [2]) that are seemingly relevant. Systematic experiments show that it is also effective.

Shouhuai Xu | Weining Zhang | Jim Dowd

[1] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[2] Charu C. Aggarwal,et al. On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[3] Jayant R. Haritsa,et al. A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[4] S L Warner,et al. Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[5] Alexandre V. Evfimievski,et al. Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[6] Mihir Bellare. Advances in Cryptology — CRYPTO 2000 , 2000, Lecture Notes in Computer Science.

[7] Yehuda Lindell,et al. Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[8] Jayant R. Haritsa,et al. Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[9] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[10] Joydeep Ghosh,et al. Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[11] Wenliang Du,et al. Deriving private information from randomized data , 2005, SIGMOD '05.

[12] W. H. Williams,et al. The Variance of an Estimator with Post-Stratified Weighting , 1962 .

[13] Martín Abadi,et al. Security analysis of cryptographically controlled access to XML documents , 2005, PODS '05.

[14] Rakesh Agrawal,et al. Privacy-preserving data mining , 2000, SIGMOD 2000.

[15] Qi Wang,et al. On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[16] L. Willenborg,et al. Elements of Statistical Disclosure Control , 2000 .

[17] Cynthia Dwork,et al. Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[18] Alexandre V. Evfimievski,et al. Privacy preserving mining of association rules , 2002, Inf. Syst..