Privacy preserving C4.5 using Gini index

Now-a-days privacy has become a major concern; the goals of security like confidentiality, integrity and availability do not ensure privacy. Data mining is a threat to privacy. Researchers today focus on how to ensure privacy while performing data mining task. As Data mining algorithms are typically complex and furthermore the input usually consists of massive data sets, the generic protocols in such a case are of no practical use and therefore more efficient protocols are required. This paper focus on the problem of decision tree learning with the popular C4.5 algorithm. C4.5, an extension of ID3 is a very popular decision tree building method in data mining. Entropy and Gini index are two different criteria used in ID3. While there is quite little work in privacy preserving ID3 using entropy and not much has been done for Gini index. This paper propose modified protocols based on secure multiparty computation for privacy preserving C4.5 using Gini index over distributed partitioned data, where the protocols do not require any third party server. However, some communication overhead is necessary so that the parties can carry out the secure protocols. The result like ROC(Receiver Operating characteristic) graph and detail accuracy through cost counting index is shown.

[1]  Songrit Maneewongvatana,et al.  Privacy Preserving Decision Tree in Multi Party Environment , 2005, AIRS.

[2]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[3]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[5]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[6]  Thair Nu Phyu Survey of Classification Techniques in Data Mining , 2009 .

[7]  Kai Han,et al.  Privacy Preserving ID3 Algorithm over Horizontally Partitioned Data , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[8]  Hong Shen,et al.  Privacy Preserving C4.5 Algorithm Over Horizontally Partitioned Data , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing (GCC'06).

[9]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[10]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[11]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[12]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[13]  Ali Miri,et al.  Privacy preserving ID3 using Gini Index over horizontally partitioned data , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[14]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[15]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[16]  Hui Shao,et al.  Research on Privacy Preserving Distributed C4. 5 Algorithm , 2009, 2009 Third International Symposium on Intelligent Information Technology Application Workshops.

[17]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.