Calibration of EuroSCORE II.

We read with great interest the recent development and internal validation of EuroSCORE II [1] in the anticipation of greater methodological rigour and reporting than for the original study describing the development of EuroSCORE I [2]. In general, we feel that there has been an improvement in reporting, however, like others we find the description of the internal validation of EuroSCORE II somewhat confusing, and subsequent responses have highlighted flaws in evaluating the calibration of EuroSCORE II [3, 4]. In particular, it is unclear, given that the authors randomly split the cohort into development and internal validation data sets, why they included an undefined portion of the development data set in calculating the Hosmer–Lemeshow test statistic, and did not just use the validation data set. Furthermore, it is disappointing that a Hosmer–Lemeshow test giving a P-value that only just exceeded the Holy Grail of 0.05 was taken to indicate good calibration; it is far from convincing evidence that EuroSCORE II is well-calibrated. Disappointingly, recent correspondence bringing this to the attention of interested readers committed the cardinal sin of associating statistical significance (P = 0.05) with clinical significance [3]. We urge authors to be clear when describing the results of a statistical test, particularly those without an associated effect measure, and refrain from inferring clinical importance on the basis of the magnitude of a single P-value. Similarly, a second correspondent who highlighted flaws of the Hosmer–Lemeshow analysis made another disappointing statement, namely that ‘statistical results are either significant or non-significant, black or white, there is no grey in statistics’ [4]. Such a statement is misinformed and discouraging; statistics is ’entirely’ about this grey zone. The foundations of statistics lie in probability theory where one is solely concerned with capturing uncertainty. Where variability in measurements exists, we can never be entirely confident that our results are not merely an artefact of the data. Furthermore, describing statistical results as significant or non-significant is highly uninformative and should be avoided in preference to measures of effect including accounting for uncertainty by reporting appropriate confidence intervals. The Hosmer–Lemeshow test has limited usefulness and provides no information on any overor under-prediction of a risk prediction model [5]. It would be more useful for authors to present calibration plots, and essential that they restrict this analysis to the validation data set only. Such plots provide a highly informative graphical display of calibration, which can be accompanied by estimates of the calibration slope with a confidence interval (a slope of 1 with intercept 0 indicates perfect discrimination) [6]. In the light of the unimpressive calibration results of EuroSCORE II, calibration plots (including examining calibration by key prognostic factors such as age) would provide an insight into where the model is miscalibrated.

[1]  P. Vukovic,et al.  Calibration of the EuroSCORE II risk stratification model: is the Hosmer-Lemeshow test acceptable any more? , 2013, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[2]  How well calibrated is EuroSCORE II? , 2013, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[3]  Samer A M Nashef,et al.  EuroSCORE II. , 2012, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.

[4]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[5]  Patrick S Romano,et al.  Size matters to a model's fit. , 2007, Critical care medicine.

[6]  S. Lemeshow,et al.  European system for cardiac operative risk evaluation (EuroSCORE). , 1999, European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery.