Bayesian Additive Regression Trees

We develop a Bayesian \sum-of-trees" model where each tree is constrained by a regularization prior to be a weak learner, and fltting and inference are accomplished via an iterative Bayesian backfltting MCMC algorithm that generates samples from a posterior. Efiectively, BART is a nonparametric Bayesian regression approach which uses dynamic random basis elements that are dimensionally adaptive. BART is motivated by ensemble methods in general, and boosting algorithms in particular. However, BART is deflned by a statistical model: a prior and a likelihood, while boosting is deflned by an algorithm. This model-based approach enables a full assessment of prediction uncertainty while remaining highly competitive in terms of prediction accuracy. The potential of BART is illustrated on examples where it compares favorably with competing methods including gradient boosting, neural nets and random forests. It is also seen that BART is remarkably efiective at flnding low dimensional structure in high dimensional data.

[1]  A. Zellner An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias , 1962 .

[2]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[3]  J. Friedman Multivariate adaptive regression splines , 1990 .

[4]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[5]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[7]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[8]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[9]  R. Tibshirani,et al.  Bayesian backfitting (with comments and a rejoinder by the authors , 2000 .

[10]  R. Cook Detection of influential observation in linear regression , 2000 .

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Yuanyuan Wang,et al.  Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods , 2003, J. Chem. Inf. Comput. Sci..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  Edward I. George,et al.  Bayesian Treed Models , 2002, Machine Learning.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Yuanyuan Wang,et al.  Statistical Methods for High Throughput Screening Drug Discovery Data , 2005 .

[18]  Edward I. George,et al.  Bayesian Ensemble Learning , 2006, NIPS.

[19]  Jun S. Liu,et al.  Predictive Modeling Approaches for Studying Protein-DNA Binding , 2007 .

[20]  Song Zhang,et al.  A spatially-adjusted Bayesian additive regression tree model to merge two datasets , 2007 .

[21]  Michael A. West,et al.  Bayesian CART: Prior Specification and Posterior Simulation , 2007 .

[22]  Hyunjoong Kim,et al.  Visualizable and interpretable regression models with good prediction power , 2007 .