Random forests is a common non-parametric regression technique which performs well for mixed-type unordered data and irrelevant features, while being robust to monotonic variable transformations. Standard random forests, however, do not efficiently handle functional data and runs into a curse-of dimensionality when presented with high-resolution curves and surfaces. Furthermore, in settings with heteroskedasticity or multimodality, a regression point estimate with standard errors do not fully capture the uncertainty in our predictions. A more informative quantity is the conditional density p(y | x) which describes the full extent of the uncertainty in the response y given covariates x. In this paper we show how random forests can be efficiently leveraged for conditional density estimation, functional covariates, and multiple responses without increasing computational complexity. We provide open-source software for all procedures with R and Python versions that call a common C++ library.
[1]
Qi Li,et al.
Nonparametric Econometrics: Theory and Practice
,
2006
.
[2]
Rafael Izbicki,et al.
Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation
,
2017,
1704.08095.
[3]
G. Tutz,et al.
Random forests for functional covariates
,
2016
.
[4]
Ann B. Lee,et al.
ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations
,
2018,
Journal of Computational and Graphical Statistics.
[5]
Robert Armstrong,et al.
GalSim: The modular galaxy image simulation toolkit
,
2014,
Astron. Comput..
[6]
Nicolai Meinshausen,et al.
Quantile Regression Forests
,
2006,
J. Mach. Learn. Res..
[7]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.