A hybrid multi-group approach for privacy-preserving data mining

In this paper, we propose a hybrid multi-group approach for privacy preserving data mining. We make two contributions in this paper. First, we propose a hybrid approach. Previous work has used either the randomization approach or the secure multi-party computation (SMC) approach. However, these two approaches have complementary features: the randomization approach is much more efficient but less accurate, while the SMC approach is less efficient but more accurate. We propose a novel hybrid approach, which takes advantage of the strength of both approaches to balance the accuracy and efficiency constraints. Compared to the two existing approaches, our proposed approach can achieve much better accuracy than randomization approach and much reduced computation cost than SMC approach. We also propose a multi-group scheme that makes it flexible for the data miner to control the balance between data mining accuracy and privacy. This scheme is motivated by the fact that existing randomization schemes that randomize data at individual attribute level can produce insufficient accuracy when the number of dimensions is high. We partition attributes into groups, and develop a scheme to conduct group-based randomization to achieve better data mining accuracy. To demonstrate the effectiveness of the proposed general schemes, we have implemented them for the ID3 decision tree algorithm and association rule mining problem and we also present experimental results.

[1]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[2]  Shafi Goldwasser,et al.  Multi party computations: past and present , 1997, PODC '97.

[3]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[4]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[5]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[6]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[7]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[8]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[9]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[10]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[11]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[12]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Hillol Kargupta,et al.  Privacy-sensitive Bayesian network parameter learning , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Rebecca N. Wright,et al.  Experimental Analysis of Privacy-Preserving Statistics Computation , 2004, Secure Data Management.

[15]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[16]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[17]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[18]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[19]  Lei Liu,et al.  Optimal randomization for privacy preserving data mining , 2004, KDD.

[20]  Jie Wang,et al.  Knowledge and Information Systems REGULAR PAPER , 2006 .

[21]  Philip S. Yu,et al.  Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.

[22]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[23]  Wenliang Du,et al.  A Hybrid Multi-group Privacy-Preserving Approach for Building Decision Trees , 2007, PAKDD.

[24]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[25]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .