Outsourcing Privacy Preserving ID3 Decision Tree Algorithm over Encrypted Data-sets for Two-Parties

ID3 decision tree data mining is a popular and widely studied data analysis technique for a range of applications. In this paper, we focus on the privacy-preserving ID3 decision tree algorithm on horizontally partitioned datasets. In such a scenario, data owners wish to learn the decision tree result from a collective data set but disclose minimal information about their own sensitive data. In this paper, we consider a scenario in which multiple parties with weak computational power need to run an ID3 algorithm on their databases jointly while simultaneously outsourcing most of the computation of the protocol and databases to the cloud. In such a scenario, each party can have the correct result calculated on the data from all the parties with most of the computation outsourced to the cloud. Concerning privacy, the data owned by each party should be kept confidential from both the other parties and the cloud. To ensure data privacy, we modify the Secure Equivalent Testing Protocol (SET) and design the Outsourced Secure Shared xlnx Protocol (OSSx ln x) and other sub-protocols. We then propose a cloud-aided ID3 solution based on these protocols, which is used to build an outsourced privacy-preserving ID3 data mining solution.

[1]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[2]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  Kai Han,et al.  Privacy Preserving ID3 Algorithm over Horizontally Partitioned Data , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[5]  Stan Matwin,et al.  Privacy-Preserving Decision Tree Classiflcation Over Horizontally Partitioned Data , 2005 .

[6]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[7]  Hong Shen,et al.  Privacy Preserving C4.5 Algorithm Over Horizontally Partitioned Data , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing (GCC'06).

[8]  Divyakant Agrawal,et al.  Privacy preserving decision tree learning over multiple parties , 2007, Data Knowl. Eng..

[9]  Ali Miri,et al.  Privacy preserving ID3 using Gini Index over horizontally partitioned data , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[10]  Ravindra Patel,et al.  BUILDING PRIVACY-PRESERVING C4.5 DECISION TREE CLASSIFIER ON MULTI- PARTIES , 2009 .

[11]  Hui Shao,et al.  Privacy Preserving C4.5 Algorithm over Vertically Distributed Datasets , 2009, 2009 International Conference on Networks Security, Wireless Communications and Trusted Computing.

[12]  Yonglong Luo,et al.  Three New Approaches to Privacy-preserving Add to Multiply Protocol and its Application , 2009, 2009 Second International Workshop on Knowledge Discovery and Data Mining.

[13]  Liusheng Huang,et al.  Relation of PPAtMP and scalar product protocol and their applications , 2010, The IEEE symposium on Computers and Communications.

[14]  Wei Jiang,et al.  An efficient and probabilistic secure bit-decomposition , 2013, ASIA CCS '13.

[15]  Stefan Katzenbeisser,et al.  Efficiently Outsourcing Multiparty Computation Under Multiple Keys , 2013, IEEE Transactions on Information Forensics and Security.

[16]  Dongxi Liu,et al.  Privacy of outsourced k-means clustering , 2014, AsiaCCS.

[17]  Zoe L. Jiang,et al.  Outsourcing Two-Party Privacy Preserving K-Means Clustering Protocol in Wireless Sensor Networks , 2015, 2015 11th International Conference on Mobile Ad-hoc and Sensor Networks (MSN).

[18]  Kim-Kwang Raymond Choo,et al.  Privacy-Preserving-Outsourced Association Rule Mining on Vertically Partitioned Databases , 2016, IEEE Transactions on Information Forensics and Security.