bartMachine: Machine Learning with Bayesian Additive Regression Trees

We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and capable of handling both large sample sizes and high-dimensional data.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  J. Friedman Multivariate adaptive regression splines , 1990 .

[4]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  David W. Aha,et al.  Instance‐based prediction of real‐valued attributes , 1989, Comput. Intell..

[6]  Edward I. George,et al.  Variable selection for BART: An application to gene regulation , 2013, 1310.4887.

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Jun S. Liu,et al.  Extracting sequence features to predict protein–DNA interactions: a comparative study , 2008, Nucleic acids research.

[9]  R. Tibshirani,et al.  Bayesian Backfitting , 1998 .

[10]  Petros Dellaportas,et al.  Bayesian Theory and Applications , 2013 .

[11]  B. Kindo,et al.  MBACT - Multiclass Bayesian Additive Classication Trees , 2013 .

[12]  David J. Hand,et al.  Good methods for coping with missing data in decision trees , 2008, Pattern Recognit. Lett..

[13]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[14]  J. Friedman Stochastic gradient boosting , 2002 .

[15]  Jeffrey S. Simonoff,et al.  An Investigation of Missing Data Methods for Classification Trees , 2006, J. Mach. Learn. Res..

[16]  Robert B. Gramacy,et al.  Dynamic Trees for Learning and Design , 2009, 0912.1586.

[17]  Adam Kapelner,et al.  Prediction with missing data via Bayesian Additive Regression Trees , 2013, ArXiv.

[18]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[19]  Sam K. Hui,et al.  Green-lighting Movie Scripts : Revenue Forecasting and Risk Management , 2010 .

[20]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[21]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[22]  James R. Gattiker,et al.  Parallel Bayesian Additive Regression Trees , 2013, 1309.1906.

[23]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[24]  Avalanche Forecasting: Using Bayesian Additive Regression Trees (BART) , 2014 .