A classification based framework for privacy preserving data mining

The information age has enabled many organizations to gather huge volumes of data. A scenario in which two parties owning confidential databases wish to run a data mining algorithm on the union of their databases, without illuminating any unnecessary information requires the protection of the privileged information. The aim of a classification problem is to classify transactions into one of a discrete set of possible categories. The secure multiparty computation problems that need to be solved at this point of time are to find the class value with the most transactions and to determine whether all the transactions have the same class attribute. In this paper we demonstrate the difference between gini index and entropy attribute measures and prove that pruning results in accuracy and privacy.

[1]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[2]  Ali Miri,et al.  Privacy preserving ID3 using Gini Index over horizontally partitioned data , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[3]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[4]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[5]  Kai Han,et al.  Privacy Preserving ID3 Algorithm over Horizontally Partitioned Data , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[6]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[7]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[8]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[9]  Sabu M. Thampi,et al.  Proceedings of the International Conference on Advances in Computing, Communications and Informatics , 2012 .

[10]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  Moti Yung,et al.  An Overview of Secure Distributed Computing , 1992 .

[13]  Shafi Goldwasser,et al.  Multi party computations: past and present , 1997, PODC '97.

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..