Tuning parameters in random forests

Breiman's (2001) random forests are a very popular class of learning algorithms often able to produce good predictions even in high-dimensional frameworks, with no need to accurately tune its inner parameters. Unfortunately, there are no theoretical findings to support the default values used for these parameters in Breiman's algorithm. The aim of this paper is therefore to present recent theoretical results providing some insights on the role and the tuning of these parameters.

[1]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[2]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[3]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[4]  L. Breiman CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .

[5]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[6]  Erwan Scornet,et al.  On the asymptotics of random forests , 2014, J. Multivar. Anal..

[7]  Robin Genuer,et al.  Variance reduction in purely random forests , 2012 .

[8]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[9]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[10]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[11]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[12]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[13]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[14]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[15]  Luc Devroye,et al.  Cellular Tree Classifiers , 2013, ALT.

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[18]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[19]  Hemant Ishwaran,et al.  The effect of splitting on random forests , 2014, Machine Learning.

[20]  Erwan Scornet,et al.  Impact of subsampling and pruning on random forests , 2016, 1603.04261.

[21]  Standard Errors for Bagged Predictors and Random Forests , 2013 .

[22]  Stefan Wager Asymptotic Theory for Random Forests , 2014, 1405.0352.

[23]  Laurent Heutte,et al.  Forest-RK: A New Random Forest Induction Method , 2008, ICIC.

[24]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[25]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[26]  Sylvain Arlot,et al.  Analysis of purely random forests bias , 2014, ArXiv.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[30]  G. Hooker,et al.  Ensemble Trees and CLTs: Statistical Inference for Supervised Learning , 2014 .