Cook Prediction Use and Misuse of the Receiver Operating Characteristic Curve in Risk

The c statistic, or area under the receiver operating characteristic (ROC) curve, achieved popularity in diagnostic testing, in which the test characteristics of sensitivity and specificity are relevant to discriminating diseased versus nondiseased patients. The c statistic, however, may not be optimal in assessing models that predict future risk or stratify individuals into risk categories. In this setting, calibration is as important to the accurate assessment of risk. For example, a biomarker with an odds ratio of 3 may have little effect on the c statistic, yet an increased level could shift estimated 10-year cardiovascular risk for an individual patient from 8% to 24%, which would lead to different treatment recommendations under current Adult Treatment Panel III guidelines. Accepted risk factors such as lipids, hypertension, and smoking have only marginal impact on the c statistic individually yet lead to more accurate reclassification of large proportions of patients into higher-risk or lower-risk categories. Perfectly calibrated models for complex disease can, in fact, only achieve values for the c statistic well below the theoretical maximum of 1. Use of the c statistic for model selection could thus naively eliminate established risk factors from cardiovascular risk prediction scores. As novel risk factors are discovered, sole reliance on the c statistic to evaluate their utility as risk predictors thus seems ill-advised. (Circulation. 2007;115:928-935.)

[1]  P. Greenland,et al.  When is a new prediction marker useful? A consideration of lipoprotein-associated phospholipase A2 and C-reactive protein for stroke risk. , 2005, Archives of internal medicine.

[2]  F. Harrell,et al.  Factors affecting sensitivity and specificity of exercise electrocardiography. Multivariable analysis. , 1984, The American journal of medicine.

[3]  Lu Tian,et al.  Predicting cardiovascular risk: so what do we do now? , 2006, Archives of Internal Medicine.

[4]  G. Rose Sick individuals and sick populations. , 2001, International journal of epidemiology.

[5]  Ralph B D'Agostino,et al.  Risk of complications of pregnancy in women with type 1 diabetes: nationwide prospective study in the Netherlands , 2004, BMJ : British Medical Journal.

[6]  F. Harrell,et al.  Sensitivity and specificity should be de-emphasized in diagnostic accuracy studies. , 2003, Academic radiology.

[7]  N. Cook,et al.  Should age and time be eliminated from cardiovascular risk prediction models? Rationale for the creation of a new national risk detection program. , 2005, Circulation.

[8]  J. Lemos The latest and greatest new biomarkers: which ones should we measure for risk prediction in our practice? , 2006 .

[9]  J. Ware The limitations of risk factors as prognostic tools. , 2006, The New England journal of medicine.

[10]  D. Levy,et al.  Multiple biomarkers for the prediction of first major cardiovascular events and death. , 2006, The New England journal of medicine.

[11]  J. Mckenney,et al.  Executive Summary of The Third Report of The National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, And Treatment of High Blood Cholesterol In Adults (Adult Treatment Panel III). , 2001, JAMA.

[12]  Yingye Zheng,et al.  Integrating the predictiveness of a marker with its performance as a classifier. , 2007, American journal of epidemiology.

[13]  S. Yusuf,et al.  Comparative Impact of Multiple Biomarkers and N-Terminal Pro-Brain Natriuretic Peptide in the Context of Conventional Risk Factors for the Prediction of Recurrent Cardiovascular Events in the Heart Outcomes Prevention Evaluation (HOPE) Study , 2006, Circulation.

[14]  A randomized trial of low-dose aspirin in the primary prevention of cardiovascular disease in women , 2005 .

[15]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[16]  H Brenner,et al.  Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. , 1997, Statistics in medicine.

[17]  Nancy R Cook,et al.  The Effect of Including C-Reactive Protein in Cardiovascular Risk Prediction Models for Women , 2006, Annals of Internal Medicine.

[18]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[19]  Stanley Lemeshow,et al.  Multiple Logistic Regression , 2005 .

[20]  H C van Houwelingen,et al.  The (in)validity of sensitivity and specificity. , 2000, Statistics in medicine.

[21]  Diederick E. Grobbee,et al.  Limitations of Sensitivity, Specificity, Likelihood Ratio, and Bayes' Theorem in Assessing Diagnostic Probabilities: A Clinical Example , 1997, Epidemiology.

[22]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[23]  M. Pepe,et al.  Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.

[24]  J. C. Christiansen,et al.  Determinants of sensitivity and specificity of electrocardiographic criteria for left ventricular hypertrophy. , 1990, Circulation.

[25]  Eric Boerwinkle,et al.  An assessment of incremental coronary risk prediction using C-reactive protein and other novel risk markers: the atherosclerosis risk in communities study. , 2006, Archives of internal medicine.

[26]  N. Obuchowski Receiver operating characteristic curves and their use in radiology. , 2003, Radiology.

[27]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[28]  M. Carroll,et al.  Serum lipids of adults 20-74 years: United States, 1976-80. , 1993, Vital and health statistics. Series 11, Data from the National Health Survey.

[29]  A. Dannenberg,et al.  Blood pressure levels in persons 18-74 years of age in 1976-80, and trends in blood pressure from 1960 to 1980 in the United States. , 1986, Vital and health statistics. Series 11, Data from the National Health Survey.

[30]  G A Diamond,et al.  What price perfection? Calibration and discrimination of clinical prediction models. , 1992, Journal of clinical epidemiology.

[31]  Mitchell H Gail,et al.  On criteria for evaluating models of absolute risk. , 2005, Biostatistics.

[32]  Lu Tian,et al.  Narrative Review: Assessment of C-Reactive Protein in Risk Prediction for Cardiovascular Disease , 2006, Annals of Internal Medicine.

[33]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.