Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model’s discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of ‘true’ performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.

[1]  L. Hooft,et al.  A guide to systematic review and meta-analysis of prediction model performance , 2017, British Medical Journal.

[2]  Thomas A Trikalinos,et al.  Simulation-Based Comparison of Methods for Meta-Analysis of Proportions and Rates , 2013 .

[3]  Karel G M Moons,et al.  A new framework to enhance the interpretation of external validation studies of clinical prediction models. , 2015, Journal of clinical epidemiology.

[4]  Dan Jackson,et al.  A new approach to outliers in meta-analysis , 2008, Health care management science.

[5]  Richard D Riley,et al.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges , 2016, BMJ.

[6]  Richard D Riley,et al.  Interpretation of random effects meta-analyses , 2011, BMJ : British Medical Journal.

[7]  Kurex Sidik,et al.  A simple confidence interval for meta‐analysis , 2002, Statistics in medicine.

[8]  Ewout W Steyerberg,et al.  Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable , 2012, BMC Medical Research Methodology.

[9]  Haitao Chu,et al.  Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. , 2006, Journal of clinical epidemiology.

[10]  G. Bedogni,et al.  Clinical Prediction Models—a Practical Approach to Development, Validation and Updating , 2009 .

[11]  Ewout W Steyerberg,et al.  Validation and updating of predictive logistic regression models: a study on sample size and shrinkage , 2004, Statistics in medicine.

[12]  Richard D Riley,et al.  Multivariate meta-analysis of individual participant data helped externally validate the performance and implementation of a prediction model , 2016, Journal of clinical epidemiology.

[13]  Yvonne Vergouwe,et al.  External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. , 2010, American journal of epidemiology.

[14]  Yvonne Vergouwe,et al.  Assessing discriminative ability of risk models in clustered data , 2014, BMC Medical Research Methodology.

[15]  Johannes B. Reitsma,et al.  Individual Participant Data (IPD) Meta-analyses of Diagnostic and Prognostic Modeling Studies: Guidance on Their Use , 2015, PLoS medicine.

[16]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[17]  T. Yoshikawa,et al.  Prediction of Gastric Cancer Development by Serum Pepsinogen Test and Helicobacter pylori Seropositivity in Eastern Asians: A Systematic Review and Meta-Analysis , 2014, PloS one.

[18]  Yvonne Vergouwe,et al.  Prognosis and prognostic research: validating a prognostic model , 2009, BMJ : British Medical Journal.

[19]  George F Borm,et al.  The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method , 2014, BMC Medical Research Methodology.

[20]  Frank E. Harrell,et al.  Prediction models need appropriate internal, internal-external, and external validation. , 2016, Journal of clinical epidemiology.

[21]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[22]  Simon G Thompson,et al.  Flexible parametric models for random‐effects distributions , 2008, Statistics in medicine.

[23]  F. Kronenberg,et al.  Multinational Assessment of Accuracy of Equations for Predicting Risk of Kidney Failure: A Meta-analysis. , 2016, JAMA.

[24]  Johannes B Reitsma,et al.  Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. , 2005, Journal of clinical epidemiology.

[25]  David J Spiegelhalter,et al.  A re-evaluation of random-effects meta-analysis , 2009, Journal of the Royal Statistical Society. Series A,.

[26]  K. Moons,et al.  Diagnostic and prognostic prediction models , 2013, Journal of thrombosis and haemostasis : JTH.

[27]  Richard D. Riley,et al.  A systematic review of breast cancer incidence risk prediction models with meta-analysis of their performance , 2012, Breast Cancer Research and Treatment.

[28]  Yvonne Vergouwe,et al.  A calibration hierarchy for risk models was defined: from utopia to empirical data. , 2016, Journal of clinical epidemiology.

[29]  C. Kent The Effect of Social Media in Social Interaction , 2019 .

[30]  Gengsheng Qin,et al.  continuous-scale diagnostic test Comparison of non-parametric confidence intervals for the area under the ROC curve of a , 2010 .

[31]  J. Hartung,et al.  On tests of the overall treatment effect in meta‐analysis with normally distributed responses , 2001, Statistics in medicine.

[32]  K. Covinsky,et al.  Assessing the Generalizability of Prognostic Information , 1999, Annals of Internal Medicine.

[33]  A. Sheikh,et al.  Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2 , 2008, BMJ : British Medical Journal.

[34]  Richard D. Riley,et al.  Random effects meta‐analysis: Coverage performance of 95% confidence and prediction intervals following REML estimation , 2016, Statistics in medicine.

[35]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[36]  J. Ioannidis,et al.  External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. , 2015, Journal of clinical epidemiology.

[37]  Mithat Gönen,et al.  A new concordance measure for risk prediction models in external validation settings , 2016, Statistics in medicine.

[38]  Patrick Royston,et al.  A new measure of prognostic separation in survival data , 2004, Statistics in medicine.

[39]  Karel G M Moons,et al.  Ruling out deep venous thrombosis in primary care , 2005, Thrombosis and Haemostasis.

[40]  Mark Woodward,et al.  Assessing Risk Prediction Models Using Individual Participant Data From Multiple Studies , 2013, American journal of epidemiology.

[41]  M. Woodward,et al.  Risk prediction models: II. External validation, model updating, and impact assessment , 2012, Heart.

[42]  Yvonne Vergouwe,et al.  Geographic and temporal validity of prediction models: different approaches were useful to examine model performance. , 2016, Journal of clinical epidemiology.

[43]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[44]  D E Grobbee,et al.  External validation is necessary in prediction research: a clinical example. , 2003, Journal of clinical epidemiology.