A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta‐analysis

The use of individual participant data (IPD) from multiple studies is an increasingly popular approach when developing a multivariable risk prediction model. Corresponding datasets, however, typically differ in important aspects, such as baseline risk. This has driven the adoption of meta-analytical approaches for appropriately dealing with heterogeneity between study populations. Although these approaches provide an averaged prediction model across all studies, little guidance exists about how to apply or validate this model to new individuals or study populations outside the derivation data. We consider several approaches to develop a multivariable logistic regression model from an IPD meta-analysis (IPD-MA) with potential between-study heterogeneity. We also propose strategies for choosing a valid model intercept for when the model is to be validated or applied to new individuals or study populations. These strategies can be implemented by the IPD-MA developers or future model validators. Finally, we show how model generalizability can be evaluated when external validation data are lacking using internal-external cross-validation and extend our framework to count and time-to-event data. In an empirical evaluation, our results show how stratified estimation allows study-specific model intercepts, which can then inform the intercept to be used when applying the model in practice, even to a population not represented by included studies. In summary, our framework allows the development (through stratified estimation), implementation in new individuals (through focused intercept choice), and evaluation (through internal-external validation) of a single, integrated prediction model from an IPD-MA in order to achieve improved model performance and generalizability.

[1]  Sylvie Chevret,et al.  Practical methodology of meta-analysis of individual patient data using a survival outcome. , 2008, Contemporary clinical trials.

[2]  Richard D Riley,et al.  Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. , 2007, Journal of clinical epidemiology.

[3]  J. Copas,et al.  Using regression models for prediction: shrinkage and regression to the mean , 1997, Statistical methods in medical research.

[4]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[5]  J. Ioannidis,et al.  Heterogeneity of the baseline risk within patient populations of clinical trials: a proposed evaluation algorithm. , 1998, American journal of epidemiology.

[6]  Karel G M Moons,et al.  Aggregating published prediction models with individual participant data: a comparison of different approaches , 2012, Statistics in medicine.

[7]  Simon G Thompson,et al.  Flexible parametric models for random‐effects distributions , 2008, Statistics in medicine.

[8]  S. Adams,et al.  Clinical prediction rules , 2012, BMJ : British Medical Journal.

[9]  L. Thalib,et al.  Combining heterogenous studies using the random-effects model is a mistake and leads to inconclusive meta-analyses. , 2011, Journal of clinical epidemiology.

[10]  Yvonne Vergouwe,et al.  A simple method to adjust clinical prediction models to local circumstances , 2009, Canadian journal of anaesthesia = Journal canadien d'anesthesie.

[11]  D. Böhning,et al.  Estimating Risk Difference in Multicenter Studies Under Baseline‐Risk Heterogeneity , 2000, Biometrics.

[12]  Yvonne Vergouwe,et al.  Prognosis and prognostic research: validating a prognostic model , 2009, BMJ : British Medical Journal.

[13]  D. Cox Two further applications of a model for binary regression , 1958 .

[14]  G. Maddala,et al.  A Comparative Study of Different Shrinkage Estimators for Panel Data Models , 2001 .

[15]  J. Ioannidis,et al.  Impact of epidemic and individual heterogeneity on the population distribution of disease progression rates. An example from patient populations in trials of human immunodeficiency virus infection. , 1996, American journal of epidemiology.

[16]  D. Altman,et al.  Measuring inconsistency in meta-analyses , 2003, BMJ : British Medical Journal.

[17]  Richard D Riley,et al.  Interpretation of random effects meta-analyses , 2011, BMJ : British Medical Journal.

[18]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[19]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[20]  Richard D Riley,et al.  Individual patient data meta-analysis of survival data using Poisson regression models , 2012, BMC Medical Research Methodology.

[21]  H. Tunstall-Pedoe,et al.  Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. , 2003, European heart journal.

[22]  D. Hosmer,et al.  Empirical comparisons of proportional hazards, poisson, and logistic regression modeling of occupational cohort data. , 1998, American journal of industrial medicine.

[23]  S D Walter,et al.  Variation in baseline risk as an explanation of heterogeneity in meta-analysis. , 1997, Statistics in medicine.

[24]  Ewout W Steyerberg,et al.  Validation and updating of predictive logistic regression models: a study on sample size and shrinkage , 2004, Statistics in medicine.

[25]  P. Royston,et al.  Flexible parametric proportional‐hazards and proportional‐odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects , 2002, Statistics in medicine.

[26]  Hans C. van Houwelingen,et al.  Validation, calibration, revision and combination of prognostic survival models , 2000 .

[27]  Tianxi Cai,et al.  The Performance of Risk Prediction Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[28]  Yvonne Vergouwe,et al.  Prognosis and prognostic research: Developing a prognostic model , 2009, BMJ : British Medical Journal.

[29]  Bradford S. Jones,et al.  Modeling Multilevel Data Structures , 2002 .

[30]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[31]  R H H Groenwold,et al.  A clinical prediction model to assess the risk of operative delivery , 2012, BJOG : an international journal of obstetrics and gynaecology.

[32]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[33]  D J Spiegelhalter,et al.  Probabilistic prediction in patient management and clinical trials. , 1986, Statistics in medicine.

[34]  Badi H. Baltagi,et al.  Homogeneous, heterogeneous or shrinkage estimators? Some empirical evidence from French regional gasoline consumption , 2003 .

[35]  S Greenland,et al.  Principles of multilevel modelling. , 2000, International journal of epidemiology.

[36]  Yvonne Vergouwe,et al.  Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. , 2005, Journal of clinical epidemiology.

[37]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[38]  Ewout W. Steyerberg,et al.  Application of Shrinkage Techniques in Logistic Regression Analysis: A Case Study , 2001 .

[39]  Y. Vergouwe,et al.  Validation, updating and impact of clinical prediction rules: a review. , 2008, Journal of clinical epidemiology.

[40]  Ewout W Steyerberg,et al.  Internal and external validation of predictive models: a simulation study of bias and precision in small samples. , 2003, Journal of clinical epidemiology.

[41]  S. Hailpern,et al.  Odds Ratios and Logistic Regression: Further Examples of their use and Interpretation , 2003 .

[42]  Patrick Royston,et al.  Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer , 2004, Statistics in medicine.

[43]  Yvonne Vergouwe,et al.  Validity of prognostic models: when is a model clinically useful? , 2002, Seminars in urologic oncology.

[44]  C J McDonald,et al.  Validation of Probabilistic Predictions , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[45]  M. Woodward,et al.  Risk prediction models: II. External validation, model updating, and impact assessment , 2012, Heart.

[46]  Juan Lu,et al.  Predicting Outcome after Traumatic Brain Injury: Development and International Validation of Prognostic Scores Based on Admission Characteristics , 2008, PLoS medicine.

[47]  J. Ioannidis,et al.  Predictive modeling and heterogeneity of baseline risk in meta-analysis of individual patient data. , 2001, Journal of clinical epidemiology.

[48]  Andrew J Vickers,et al.  Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework. , 2010, Seminars in oncology.

[49]  Patrick Royston,et al.  Visualizing and assessing discrimination in the logistic regression model , 2010, Statistics in medicine.

[50]  R. Riley,et al.  Meta-analysis of individual participant data: rationale, conduct, and reporting , 2010, BMJ : British Medical Journal.

[51]  Karel G M Moons,et al.  Ruling out deep venous thrombosis in primary care , 2005, Thrombosis and Haemostasis.

[52]  A. Evans,et al.  Translating Clinical Research into Clinical Practice: Impact of Using Prediction Rules To Make Decisions , 2006, Annals of Internal Medicine.

[53]  Daniel Krewski,et al.  Random effects Cox models: A Poisson modelling approach , 2003 .

[54]  Gerta Rücker,et al.  Bmc Medical Research Methodology Open Access Undue Reliance on I 2 in Assessing Heterogeneity May Mislead , 2022 .

[55]  L. Stewart,et al.  Predicting infectious complications in neutropenic children and young people with cancer (IPD protocol) , 2012, Systematic Reviews.

[56]  M. Woodward,et al.  Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker , 2012, Heart.

[57]  N. Cook Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. , 2008, Clinical chemistry.

[58]  Dan Jackson,et al.  Multivariate meta-analysis: Potential and promise , 2011, Statistics in medicine.

[59]  David V Glidden,et al.  Modelling clustered survival data from multicentre clinical trials , 2004, Statistics in medicine.