Logistic Regression Model: An Assessment of Variability of Predictions

Risk prediction models available for cardiovascular prevention are statistical or based on machine learning methods. This paper investigates whether the logistic regression method can be considered as reference for validation of other methods. In order to test the stability of the predictions using this method, we performed two types of analyses on 50 random training and test samples drawn from the same database. In first analyses three models were obtained by forced entry of different sets of four variables. In second analyses, models were built with increasing number of predictive variables. The predictive performance was assessed by the area under the ROC curve. Although across-samples variability is low for a given model, it is large enough to lead to wrong conclusions when comparing different prediction methods. We also suggest that a low events-per-variable ratio alters the stability of a model's coefficients but does not affect the variability of prediction performance.

[1]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[2]  A. von Eckardstein,et al.  Hypertriglyceridemia and elevated lipoprotein(a) are risk factors for major coronary events in middle-aged men. , 1996, The American journal of cardiology.

[3]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[4]  J. Ménard,et al.  The absolute risk as a guide to influence the treatment decision-making process in mild hypertension. , 1997, Journal of hypertension.

[5]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[6]  R B D'Agostino,et al.  A comparison of logistic regression to decision-tree induction in a medical domain. , 1993, Computers and biomedical research, an international journal.

[7]  L. Bottaci,et al.  Artificial neural networks applied to outcome prediction for colorectal cancer patients in separate institutions , 1997, The Lancet.

[8]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[9]  P. Lapuerta,et al.  Use of neural networks in predicting the risk of coronary artery disease. , 1995, Computers and biomedical research, an international journal.

[10]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[11]  James E. Calvin,et al.  Understanding articles describing clinical prediction tools , 1998 .

[12]  M W Knuiman,et al.  An empirical comparison of multivariable methods for estimating risk of death from coronary heart disease. , 1997, Journal of cardiovascular risk.

[13]  Jesse A. Berlin,et al.  Assessing the Generalizability of Prognostic Information , 1999 .

[14]  M. Abrahamowicz,et al.  Can computerized risk profiles help patients improve their coronary risk? The results of the Coronary Health Assessment Study (CHAS). , 1998, Preventive medicine.

[15]  J. Concato,et al.  The Risk of Determining Risk with Multivariable Models , 1993, Annals of Internal Medicine.