We would thank Helena Chmura Kraemer for her interest in our paper. We agree that “the value of mathematical models in medical research does not lie in their proliferation, their elegance or complexity, but in their clinical usefulness.” Indeed, we expressly introduce some recently developed methods that can help to determine whether use of a prediction model in practice would improve clinical outcome. Patients, clinicians, and researchers need interpretable measures that indicate the extent to which a model supports better decision-making. Our paper provides an overview of measures to assess performance of predictions from a model, including calibration and discrimination measures, and measures that quantify how much a model improves decisionmaking. Kraemer adds the weighted kappa, k(r), to the latter measures, which essentially is a rescaled version of the net benefit as originally proposed by Peirce, and which was emphasized in our paper. We basically agree here also, because both methods weight the classifications of prediction models in terms of their consequences. Is k(r) better than net benefit? We doubt this. Net benefit is indeed dependent on the incidence of the outcome (“P”), but this is entirely appropriate if we want to operationalize clinical usefulness. As a simple example, a model has greater potential to be clinically useful if the outcome occurs in 50% of patients than in only 0.01%. The receiver operating characteristic curve with a “diagnosis line” is not particularly helpful either. We argue that displaying ROC curves is a waste of precious journal space, unless predictions are shown at the curve, as in our original Figure 1. We are puzzled regarding Kraemer’s concern that we “obscured the fact” that we use separate development and validation samples. This is clearly indicated throughout the paper. External validation is an important principle for evaluating prediction models. A model may be more or less useful at validation, depending on the incidence of the outcome, the distribution of predicted values, and the validity of the predicted probabilities. Our main point of disagreement is what a “prediction” is. The principle of making predictions for binary outcomes in terms of probabilities has a long history, both in medicine and other fields such as weather forecasting. But expressing the results of such predictions purely in binary terms is predominately associated with “prophecies” that do or do not come true: a scientific weather forecast gives a “60% chance of rain”; a seer states “a blue-eyed child will ascend to the throne in a dark time.” A patient with a probability of 29% for an outcome cannot be given a meaningful answer to the question, “Well, do I (will I) have the outcome or not?” Kraemer’s advice to dichotomize to a binary prediction clearly results in a loss of information: 2 patients with different predicted probabilities of disease of, say 2% and 24%, end up with the same information of “not diseased” if the threshold is above 24%. Predicted probabilities play an important role, such as in situations where monitoring of patients is an option, or when
[1]
C.J.H. Mann,et al.
Clinical Prediction Models: A Practical Approach to Development, Validation and Updating
,
2009
.
[2]
Yvonne Vergouwe,et al.
Validity of prognostic models: when is a model clinically useful?
,
2002,
Seminars in urologic oncology.
[3]
Hans C. van Houwelingen,et al.
Validation, calibration, revision and combination of prognostic survival models
,
2000
.
[4]
J D Habbema,et al.
The Measurement of Performance in Probabilistic Diagnosis V. General Recommendations
,
1981,
Methods of Information in Medicine.
[5]
J Hilden.
Prevalence-free utility-respecting summary indices of diagnostic power do not exist.
,
2000,
Statistics in medicine.
[6]
D. Cox.
Two further applications of a model for binary regression
,
1958
.
[7]
H. Kraemer.
The usefulness of mathematical models in assessing medical tests.
,
2010,
Epidemiology.
[8]
H C van Houwelingen,et al.
Validation, calibration, revision and combination of prognostic survival models.
,
2000,
Statistics in medicine.
[9]
C S Peirce,et al.
The numerical measure of the success of predictions.
,
1884,
Science.
[10]
N. Obuchowski,et al.
Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures
,
2010,
Epidemiology.
[11]
Sunil J Rao,et al.
Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis
,
2003
.
[12]
L. Irwig,et al.
Assessing new biomarkers and predictive models for use in clinical practice: a clinician's guide.
,
2008,
Archives of internal medicine.
[13]
E. Elkin,et al.
Decision Curve Analysis: A Novel Method for Evaluating Prediction Models
,
2006,
Medical decision making : an international journal of the Society for Medical Decision Making.