Validating a prognostic model
暂无分享,去创建一个
H upertan et al. have done a nice analysis of the accuracy of a previously published nomogram when applied to an external dataset. The authors used a sample of 565 men who were treated at a single institution in France. They found that the performance of the nomogram, as measured by the concordance index (similar to the area under a receiver operating characteristic curve) dropped from 0.74 to 0.607. The authors suggest dynamic revision of renal cell cancer prediction models, although they acknowledge that the current nomogram may still be the best alternative. It would be interesting to see a model-free assessment of the calibration of the nomogram. In Figure 2 of Hupertan et al., a scatter plot is presented that compares nomogram-predicted probabilities with predicted probabilities from a Cox model fit to the French data. The plot suggests that the 2 models’ predictions do not agree well; however, it would be of considerably more interest to determine how the nomogram-predicted probabilities agree with actual patient outcome. This could be done by grouping patients with respect to their predicted probabilities then comparing the mean prediction for each group with the actual (eg, Kaplan-Meier) estimate. Such a presentation would avoid the assumption that a Cox model fit to a dataset was a gold standard. Although assessment of the nomogram calibration could be questioned, discrimination of the nomogram (ie, how well it ranks patients) is measured precisely by using the concordance index. In their initial publication, Kattan et al. reported a concordance index of 0.74. However, Hupertan et al. obtained a considerably lower value (0.607). Curiously, in a previous validation study, Cindolo et al. actually obtained a higher value (0.807). It would be convenient if a reliable threshold existed, above which a tool would be deemed clinically useful and below which a tool would be deemed not clinically useful. However, that threshold is remarkably difficult to establish, although it is a subject of active methodological research. Without that threshold, it is hard to know how to react when the concordance index jumps like this. What is the comparison? Because the concordance index will change at least to some degree with each new assessment, a direct comparison of alternatives greatly facilitates decision making. In other words, head-to-head comparison of prediction approaches applied to a common dataset generally yields concordance indices that are straightforward to compare. In that set-up, in which the decision maker is forced to pick among alternative prediction approaches, he or she typically can pick the winner of this contest. This type of comparison was a major See referenced original article on pages 2604–8, this issue.
[1] M. Kattan,et al. A postoperative prognostic nomogram for renal cell carcinoma. , 2001, The Journal of urology.
[2] R. Dawes,et al. Heuristics and Biases: Clinical versus Actuarial Judgment , 2002 .
[3] Y. Chrétien,et al. Low predictive accuracy of the Kattan postoperative nomogram for renal cell carcinoma recurrence in a population of French patients , 2006, Cancer.