Differential private random forest

Organizations be it private or public often collect personal information about an individual who are their customers or clients. The personal information of an individual is private and sensitive which has to be secured from data mining algorithm which an adversary may apply to get access to the private information. In this paper we have consider the problem of securing these private and sensitive information when used in random forest classifier in the framework of differential privacy. We have incorporated the concept of differential privacy to the classical random forest algorithm. Experimental results shows that quality functions such as information gain, max operator and gini index gives almost equal accuracy regardless of their sensitivity towards the noise. Also the accuracy of the classical random forest and the differential private random forest is almost equal for different size of datasets. The proposed algorithm works for datasets with categorical as well as continuous attributes.

[1]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[2]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  John Mingers,et al.  An empirical comparison of selection measures for decision-tree induction , 2004, Machine Learning.

[4]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[5]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[6]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[7]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[8]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[9]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[10]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[13]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[14]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[15]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[16]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[17]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.