An empirical comparison of multivariable methods for estimating risk of death from coronary heart disease.

BACKGROUND Logistic regression and, more recently, Cox regression have been the predominant methods for identifying risk factors and developing risk estimation equations for coronary heart disease (CHD). Software for the regression tree method is now available for binary and survival outcomes and thus offers an alternative methodology. This paper compares these four methods for identifying significant risk factors from among a set of candidate factors and for estimating the risk of death from CHD using baseline and mortality follow-up data on 1,701 men participating in the Busselton Health Study. The candidate risk factors were age, body mass index, systolic and diastolic blood pressure, treatment for hypertension, cholesterol and smoking. METHODS Logistic regression, Cox proportional hazards regression, binary regression tree, and survival regression tree analyses have been applied to data obtained from the same cohort of men for CHD death risk estimation and prediction. The four methods are compared in terms of the variables selected, goodness-of-fit of models, similarity of cross-validated estimated risks for individuals, and ability to discriminate between those who died from CHD and those who did not die from CHD during the follow-up period, including the comparison of Receiver Operating Characteristic (ROC) curves. RESULTS Although age and a blood pressure variable were selected by all four methods, body mass index was also selected by the regression tree methods and smoking was also selected by Cox regression. There was good, but not excellent, agreement between methods in estimates of risk for individuals, the areas under the ROC curves were 0.66 for the binary tree, 0.72 for logistic regression, 0.71 for the survival tree method and 0.78 for Cox regression. The average differences in estimated risk between those who died from CHD and those who did not die from CHD during the follow-up period were 0.051 for logistic regression, 0.070 for the binary tree method, 0.073 for the survival tree method and 0.088 for Cox regression. CONCLUSION For a moderately sized cohort typical of many applications of these methods in the literature, the two methods which used the survival outcome performed better than the methods using a binary outcome. Despite selecting some different variables and showing moderate differences in risk estimates for individuals, the two binary approaches were similar in performance. Cox regression appeared to be superior to the survival tree method, but further larger studies of completely separate samples for model development and evaluation of prediction performance are required to confirm this finding.

[1]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[2]  J. C. van Houwelingen,et al.  Predictive value of statistical models , 1990 .

[3]  S J Pocock,et al.  Prognostic scores for detecting a high risk group: estimating the sensitivity when applied to new data. , 1990, Statistics in medicine.

[4]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[5]  E. K. Harris,et al.  Survivorship Analysis for Clinical Studies , 1990 .

[6]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[7]  A J Dobson,et al.  On the use of a logistic risk score in predicting risk of coronary heart disease. , 1990, Statistics in medicine.

[8]  E F Cook,et al.  Empiric comparison of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis. , 1984, Journal of chronic diseases.

[9]  T. Welborn,et al.  Age and secular trends in risk factors for cardiovascular disease in Busselton. , 2010, Australian journal of public health.

[10]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[13]  Busselton, A Community Study in Cardiovascular Disease , 1974 .

[14]  D C Hadorn,et al.  Cross-validation performance of mortality prediction models. , 1992, Statistics in medicine.

[15]  Daniel L. McGee,et al.  Composite scoring--methods and predictive validity: insights from the Framingham Study. , 1987, Health services research.

[16]  T. Welborn,et al.  The prevalence of coronary heart disease and associated factors in an Australian rural community. , 1969, American journal of epidemiology.

[17]  R. D'Agostino,et al.  A comparison of performance of mathematical predictive methods for medical diagnosis: identifying acute cardiac ischemia among emergency department patients. , 1995, Journal of investigative medicine : the official publication of the American Federation for Clinical Research.

[18]  S D Walter,et al.  A comparison of multivariable mathematical methods for predicting survival--III. Accuracy of predictions in generating and challenge sets. , 1990, Journal of clinical epidemiology.

[19]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[20]  T M Therneau,et al.  Diagnostic plots to reveal functional form for covariates in multiplicative intensity models. , 1995, Biometrics.

[21]  W. Kannel,et al.  A general cardiovascular risk profile: the Framingham Study. , 1976, The American journal of cardiology.

[22]  M W Knuiman,et al.  Mortality trends, 1965 to 1989, in Busselton, the site of repeated health surveys and interventions. , 2010, Australian journal of public health.

[23]  C. Wells,et al.  A comparison of multivariable mathematical methods for predicting survival--II. Statistical selection of prognostic variables. , 1990, Journal of clinical epidemiology.

[24]  R B D'Agostino,et al.  A comparison of logistic regression to decision-tree induction in a medical domain. , 1993, Computers and biomedical research, an international journal.

[25]  W. Kannel,et al.  Representativeness of the Framingham risk model for coronary heart disease mortality: a comparison with a national cohort study. , 1987, Journal of chronic diseases.

[26]  M R Segal,et al.  A comparison of estimated proportional hazards models and regression trees. , 1989, Statistics in medicine.