Ensemble of trees approaches to risk adjustment for evaluating a hospital’s performance

AbstractA commonly used method for evaluating a hospital’s performance on an outcome is to compare the hospital’s observed outcome rate to the hospital’s expected outcome rate given its patient (case) mix and service. The process of calculating the hospital’s expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals’ performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.

[1]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[2]  L. Iezzoni Risk Adjustment for Measuring Healthcare Outcomes , 1994 .

[3]  J. Bronstein,et al.  The effects of patient volume and level of care at the hospital of birth on neonatal mortality. , 1996, JAMA.

[4]  J. Allison Risk adjustment for measuring health care outcomes , 1996 .

[5]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[8]  øöö Blockinøø Well-Trained PETs : Improving Probability Estimation , 2000 .

[9]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[10]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[11]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[16]  Aaron B Caughey,et al.  Level and volume of neonatal intensive care and mortality in very-low-birth-weight infants. , 2007, The New England journal of medicine.

[17]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[18]  Sharon-Lise T. Normand,et al.  Statistical and Clinical Aspects of Hospital Outcomes Profiling , 2007, 0710.4622.

[19]  Henrik Boström,et al.  Estimating class probabilities in random forests , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[20]  Henrik Boström Estimating class probabilities in random forests , 2007, ICMLA 2007.

[21]  Peter C Austin,et al.  Bayes rules for optimally using Bayesian hierarchical regression models in provider profiling to identify high-mortality hospitals , 2008, BMC medical research methodology.

[22]  Joseph Sedransk,et al.  Bayesian and Frequentist Methods for Provider Profiling Using Risk-Adjusted Assessments of Medical Outcomes , 2010 .

[23]  Susan Groshen,et al.  Outlier detection for a hierarchical Bayes model in a study of hospital variation in surgical procedures , 2009, Statistical methods in medical research.

[24]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[25]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[26]  Thomas A Louis,et al.  Percentile‐based empirical distribution function estimates for performance evaluation of healthcare providers , 2011, Journal of the Royal Statistical Society. Series C, Applied statistics.

[27]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[28]  Howard A. Fine,et al.  Predicting in vitro drug sensitivity using Random Forests , 2011, Bioinform..

[29]  Dylan S. Small,et al.  The Differential Impact of Delivery Hospital on the Outcomes of Premature Infants , 2012, Pediatrics.

[30]  J. D. Malley,et al.  Probability Machines , 2011, Methods of Information in Medicine.

[31]  John D. Kalbfleisch,et al.  On Monitoring Outcomes of Medical Providers , 2013 .

[32]  G. Vittadini,et al.  Comparing health outcomes among hospitals: the experience of the Lombardy Region , 2013, Health Care Management Science.

[33]  Wei Wang,et al.  Template matching for auditing hospital cost and quality. , 2014, Health services research.

[34]  S. L. Normand,et al.  On the accuracy of classifying hospitals on their performance measures , 2014, Statistics in medicine.

[35]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.

[36]  Francesca Ieva,et al.  Detecting and visualizing outliers in provider profiling via funnel plots and mixed effect models , 2014, Health Care Management Science.

[37]  B. Manktelow,et al.  Comparison of four methods for deriving hospital standardised mortality ratios from a single hierarchical logistic regression model , 2016, Statistical methods in medical research.