BiMM forest: A random forest method for modeling clustered and longitudinal binary outcomes.

Clustered binary outcomes and datasets with many predictor variables are frequently encountered in clinical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) typically employed for clustered endpoints have challenges for some scenarios, particularly for complex datasets which contain many interactions among predictors and nonlinear predictors of outcome. We propose a new method called Binary Mixed Model (BiMM) forest, which combines random forest and GLMM methodology. BiMM forest offers a flexible and stable method which naturally models interactions among predictors and can be employed in the setting of clustered data. Simulation studies show that BiMM forest achieves similar or superior prediction accuracy compared to standard random forest, GLMMs and its tree counterpart (BiMM tree) for clustered binary outcomes. The method is applied to a real dataset from the Acute Liver Failure Study Group. BiMM forest offers an alternative method for modeling clustered binary outcomes which may be applied in myriad research settings.

[1]  R Williams,et al.  Early indicators of prognosis in fulminant hepatic failure. , 1989, Gastroenterology.

[2]  David G. Koch,et al.  BiMM tree: a decision tree method for modeling clustered and longitudinal binary outcomes , 2018, Commun. Stat. Simul. Comput..

[3]  P. Ott,et al.  Cerebral herniation in patients with acute E liver failure is correlated with arterial ammonia concentration , 1999, Hepatology.

[4]  Valerie Durkalski,et al.  Development of a Model to Predict Transplant-free Survival of Patients With Acute Liver Failure. , 2016, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[5]  Diane Lambert,et al.  Fitting Trees to Functional Data, with an Application to Time-of-Day Patterns , 1999 .

[6]  Constantine J. Karvellas,et al.  Predicting Outcome on Admission and Post-Admission for Acetaminophen-Induced Acute Liver Failure Using Classification and Regression Tree Models , 2015, PloS one.

[7]  J. Wakefield,et al.  Bayesian inference for generalized linear mixed models. , 2010, Biostatistics.

[8]  Denis Larocque,et al.  Mixed effects regression trees for clustered data , 2008 .

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  Denis Larocque,et al.  Mixed-effects random forest for clustered data , 2014 .

[11]  Ciprian M. Crainiceanu,et al.  Nonparametric Regression Methods for Longitudinal Data Analysis. Mixed-effects Modeling Approaches , 2007 .

[12]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[13]  T Hothorn,et al.  Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees , 2017, Behavior Research Methods.

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Jeffrey S. Simonoff,et al.  Unbiased Regression Trees for Longitudinal and Clustered Data , 2014, Comput. Stat. Data Anal..

[17]  Seong Keon Lee,et al.  On generalized multivariate decision tree by using GEE , 2005, Comput. Stat. Data Anal..

[18]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[19]  Abdissa Negassa,et al.  Recursive partition and amalgamation with the exponential family: Theory and applications , 1991 .

[20]  W. Loh,et al.  Regression trees for longitudinal and multiresponse data , 2012, 1209.4690.

[21]  Carolin Strobl,et al.  A new variable importance measure for random forests with missing data , 2012, Statistics and Computing.

[22]  Denis Larocque,et al.  Generalized mixed effects regression trees , 2010 .

[23]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[24]  William M. Lee,et al.  Random forest classification of etiologies for an orphan disease , 2015, Statistics in medicine.

[25]  G. De’ath MULTIVARIATE REGRESSION TREES: A NEW TECHNIQUE FOR MODELING SPECIES–ENVIRONMENT RELATIONSHIPS , 2002 .

[26]  Christopher Zorn,et al.  A Solution to Separation in Binary Response Models , 2005, Political Analysis.

[27]  N. Donaldson,et al.  Blood lactate as an early predictor of outcome in paracetamol-induced acute liver failure: a cohort study , 2002, The Lancet.

[28]  Denis Larocque,et al.  Multivariate trees for mixed outcomes , 2009, Comput. Stat. Data Anal..

[29]  John A. Nelder,et al.  Conditional and Marginal Models: Another View , 2004 .

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  M LeBlanc,et al.  Binary partitioning for continuous longitudinal data: categorizing a prognostic variable , 2002, Statistics in medicine.

[32]  William M. Lee,et al.  Acute liver failure: Summary of a workshop , 2007, Hepatology.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  V. Dorie,et al.  Mixed methods for mixed models , 2014 .

[35]  Jeffrey S. Simonoff,et al.  RE-EM trees: a data mining approach for longitudinal and clustered data , 2011, Machine Learning.

[36]  M. Segal Tree-Structured Methods for Longitudinal Data , 1992 .