Logistic regression had superior performance compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure.

OBJECTIVE To compare the predictive accuracy of regression trees with that of logistic regression models for predicting in-hospital mortality in patients hospitalized with heart failure. STUDY DESIGN AND SETTING Models were developed in 8,236 patients hospitalized with heart failure between April 1999 and March 2001. Models included the Enhanced Feedback for Effective Cardiac Treatment and Acute Decompensated Heart Failure National Registry (ADHERE) regression models and tree. Predictive accuracy was assessed using 7,608 patients hospitalized between April 2004 and March 2005. RESULTS The area under the receiver operating characteristic curve for five different logistic regression models ranged from 0.747 to 0.775, whereas the corresponding values for three different regression trees ranged from 0.620 to 0.651. For the regression trees grown in 1,000 random samples drawn from the derivation sample, the number of terminal nodes ranged from 1 to 6, whereas the number of variables used in specific trees ranged from 0 to 5. Three different variables (blood urea nitrogen, dementia, and systolic blood pressure) were used for defining the first binary split when growing regression trees. CONCLUSION Logistic regression predicted in-hospital mortality in patients hospitalized with heart failure more accurately than did the regression trees. Regression trees grown in random samples from the same data set can differ substantially from one another.

[1]  Salim Yusuf,et al.  A multivariate model for predicting mortality in patients with heart failure and systolic dysfunction. , 2004, The American journal of medicine.

[2]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  A. Bayés‐Genís,et al.  Risk score : a simple method for predicting mortality in ambulatory patients with chronic heart failure , 2009 .

[5]  Peter C Austin,et al.  Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. , 2004, Journal of clinical epidemiology.

[6]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[7]  R. D'Agostino,et al.  Presentation of multivariate data for clinical use: The Framingham Study risk score functions , 2004, Statistics in medicine.

[8]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  J Col,et al.  Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction. Results from an international trial of 41,021 patients. GUSTO-I Investigators. , 1995, Circulation.

[11]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[12]  Peter C. Austin,et al.  Predicting mortality among patients hospitalized for heart failure: derivation and validation of a clinical model. , 2003, JAMA.

[13]  P. Austin The large-sample performance of backwards variable elimination , 2008 .

[14]  Peter C Austin,et al.  Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. , 2009, JAMA.

[15]  S. A. Gansky,et al.  Dental Data Mining: Potential Pitfalls and Practical Issues , 2003, Advances in dental research.

[16]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[17]  G. Fonarow,et al.  Predictors of in-hospital mortality in patients hospitalized for heart failure: insights from the Organized Program to Initiate Lifesaving Treatment in Hospitalized Patients with Heart Failure (OPTIMIZE-HF). , 2008, Journal of the American College of Cardiology.

[18]  M. Lauer,et al.  Predictors of mortality in patients with heart failure and preserved systolic function in the Digitalis Investigation Group trial. , 2003, Journal of the American College of Cardiology.

[19]  V. Flack,et al.  Frequency of Selecting Noise Variables in Subset Regression Analysis: A Simulation Study , 1987 .

[20]  S. Lemon,et al.  Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression , 2003, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[21]  W John Boscardin,et al.  Risk stratification for in-hospital mortality in acutely decompensated heart failure: classification and regression tree analysis. , 2005, JAMA.

[22]  Patrick Royston,et al.  Risk stratification for in-hospital mortality in acutely decompensated heart failure. , 2005, JAMA.

[23]  J. G. Cragg,et al.  The Demand for Automobiles , 1970 .

[24]  C. Yancy,et al.  Risk Stratification for In-Hospital Mortality in Acutely Decompensated Heart Failure—Reply , 2005 .

[25]  P. Austin A comparison of classification and regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality , 2007 .

[26]  W Sauerbrei,et al.  Differentiation of Benign and Malignant Breast Tumors by Logistic Regression and a Classification Tree using Doppler flow signals , 1998, Methods of Information in Medicine.

[27]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[28]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[29]  J. Tu,et al.  Multicenter validation of a risk index for mortality, intensive care unit stay, and overall hospital length of stay after cardiac surgery. Steering Committee of the Provincial Adult Cardiac Care Network of Ontario. , 1995, Circulation.

[30]  John Bell,et al.  Tree-based methods , 1999 .

[31]  D A Bloch,et al.  Recursive partitioning for the identification of disease risk subgroups: a case-control study of subarachnoid hemorrhage. , 1998, Journal of clinical epidemiology.

[32]  Peter C Austin,et al.  A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality , 2007, Statistics in medicine.

[33]  D. Mozaffarian,et al.  The Seattle Heart Failure Model: Prediction of Survival in Heart Failure , 2006, Circulation.