DPETs: A Differentially Private ExtraTrees

In this paper, we consider the problem of constructing private classifiers using extra decision trees, within the framework of differential privacy. We proposed a differential privacy classifier DPETs using Laplace mechanism and exponential mechanism in the construction of each decision tree during the process of splitting point and selecting attribute. We used the gini index as the scoring function of exponential mechanism, distributed the privacy budget dynamically by calculating its consumption and used Laplace mechanism adding count noise for the equivalence class. DPETs satisfies the requirement of differential privacy during the whole process. Due to the randomization in the process of feature selection and division, noise added to ensure the privacy was reduced compared with the construction of traditional differential private decision trees, so the accuracy of the classifier was improved especially in high dimensional datasets with discrete attributes.

[1]  P Xiong,et al.  A Survey on Differential Privacy and Applications , 2014 .

[2]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[3]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[4]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[5]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[6]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Zhang Xiaojian,et al.  An Accurate Method for Mining top-k Frequent Pattern Under Differential Privacy , 2014 .

[10]  Pramod Viswanath,et al.  Optimal Noise Adding Mechanisms for Approximate Differential Privacy , 2016, IEEE Transactions on Information Theory.

[11]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[12]  Abhijit Patil,et al.  Differential private random forest , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[13]  Shao Chao,et al.  Accurate Histogram Release under Differential Privacy , 2016 .

[14]  Yin Yang,et al.  Differential privacy in data publication and analysis , 2012, SIGMOD Conference.

[15]  Tianqing Zhu,et al.  An Effective Deferentially Private Data Releasing Algorithm for Decision Tree , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[16]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[17]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.