Multinomial Random Forest: Toward Consistency and Privacy-Preservation

Despite the impressive performance of standard random forests (RF), its theoretical properties have not been thoroughly understood. In this paper, we propose a novel RF framework, dubbed multinomial random forest (MRF), to discuss the consistency and privacy-preservation. Instead of deterministic greedy split rule, the MRF adopts two impurity-based multinomial distributions to randomly select a split feature and a split value respectively. Theoretically, we prove the consistency of the proposed MRF and analyze its privacy-preservation within the framework of differential privacy. We also demonstrate with multiple datasets that its performance is on par with the standard RF. To the best of our knowledge, MRF is the first consistent RF variant that has comparable performance to the standard RF.

[1]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[2]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[3]  Ran Xu,et al.  Random forests for metric learning with implicit pairwise position dependence , 2012, KDD.

[4]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Abhijit Patil,et al.  Differential private random forest , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[6]  L. Breiman CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .

[7]  Yang Yu,et al.  Spectrum of Variable-Random Trees , 2008, J. Artif. Intell. Res..

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Shu-Tao Xia,et al.  A Novel Consistent Random Forest Framework: Bernoulli Random Forests , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[11]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[12]  Misha Denil,et al.  Narrowing the Gap: Random Forests In Theory and In Practice , 2013, ICML.

[13]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[14]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[15]  Misha Denil,et al.  Consistency of Online Random Forests , 2013, ICML.

[16]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[17]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[19]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[20]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[21]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[22]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[23]  Robin Genuer,et al.  Variance reduction in purely random forests , 2012 .

[24]  Paul A. Bromiley,et al.  Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..