Assessing Risk Prediction Models Using Individual Participant Data From Multiple Studies

Individual participant time-to-event data from multiple prospective epidemiologic studies enable detailed investigation into the predictive ability of risk models. Here we address the challenges in appropriately combining such information across studies. Methods are exemplified by analyses of log C-reactive protein and conventional risk factors for coronary heart disease in the Emerging Risk Factors Collaboration, a collation of individual data from multiple prospective studies with an average follow-up duration of 9.8 years (dates varied). We derive risk prediction models using Cox proportional hazards regression analysis stratified by study and obtain estimates of risk discrimination, Harrell's concordance index, and Royston's discrimination measure within each study; we then combine the estimates across studies using a weighted meta-analysis. Various weighting approaches are compared and lead us to recommend using the number of events in each study. We also discuss the calculation of measures of reclassification for multiple studies. We further show that comparison of differences in predictive ability across subgroups should be based only on within-study information and that combining measures of risk discrimination from case-control studies and prospective studies is problematic. The concordance index and discrimination measure gave qualitatively similar results throughout. While the concordance index was very heterogeneous between studies, principally because of differing age ranges, the increments in the concordance index from adding log C-reactive protein to conventional risk factors were more homogeneous.

Mark Woodward | Shah Ebrahim | Johan Sundström | Georg Schett | Hans L. Hillege | Ying Zhang | Simon G. Thompson | Vilmundur Gudnason | Lewis H. Kuller | Hermann Brenner | Stephen Kaptoge | Paul M. Ridker | Altan Onat | Ralph B. D'Agostino | JoAnn E. Manson | Steven Shea | Giel Nijpels | Mary Cushman | Ron T. Gansevoort | Stephan J. L. Bakker | Brendan M. Buckley | Else-Marie Bladbjerg | Anne Tybjærg-Hansen | Philippe Amouyel | Jean Ferrières | Nicholas J. Wareham | Albert Hofman | Philip Haycock | John Danesh | Veikko Salomaa | Mika Kivimäki | Nancy Cook | David J. Couper | Yutaka Kiyohara | Ian R. White | Michael Marmot | Erkki Vartiainen | Jukka T. Salonen | Christie M. Ballantyne | Kay-Tee Khaw | Richard W. Morris | Stela McLachlan | Peter Willeit | Johann Willeit | Jackie F. Price | Jussi Kauhanen | Gunnar Sigurdsson | Eric Brunner | Adam S. Butterworth | Angela Döring | Bruce M. Psaty | Josef Coresh | Angela M. Wood | John Gallacher | Pei Gao | Thor Aspelund | Daichi Shimbo | Oscar H. Franco | Naveed Sattar | Debbie A. Lawlor | Robert Clarke | Jonathan A. Shaffer | J. Gallacher | J. Danesh | R. Collins | O. Franco | A. Hofman | J. Manson | E. Rimm | D. Lawlor | V. Salomaa | E. Vartiainen | T. Jørgensen | R. D'Agostino | N. Cook | M. Woodward | P. Ridker | V. Gudnason | H. Brenner | I. White | A. Wood | J. Buring | A. Folsom | D. Couper | J. Witteman | J. Coresh | L. Kuller | B. Psaty | E. Barrett-Connor | B. Howard | N. Sattar | S. Hankinson | M. Marmot | M. Stampfer | A. Evans | D. Arveiler | A. Döring | M. Kivimäki | R. Gillum | B. Schöttker | P. Jousilahti | J. Salonen | P. Amouyel | G. Nijpels | J. Dekker | N. Wareham | G. Sigurdsson | S. Kiechl | P. Willeit | J. Willeit | S. Ebrahim | M. Mussolino | R. Westendorp | H. Müller | D. Wingard | A. Fletcher | T. Aspelund | R. Clarke | T. Meade | S. Shea | C. Ballantyne | J. Price | K. Khaw | J. Ferrières | T. Ninomiya | E. Brunner | S. Kaptoge | E. Angelantonio | A. Butterworth | M. Kavousi | S. McLachlan | A. Onat | S. Thompson | Amanda J Lee | M. Cushman | B. Buckley | H. Hillege | J. Sundström | S. Bakker | I. Ford | R. Gansevoort | J. Jansson | R. Morris | P. Haycock | Y. Kiyohara | H. Arima | J. Yarnell | C. Stehouwer | S. Kirkland | M. Shipley | R. Tracy | J. Cooper | D. Shimbo | G. Schett | G. Grandits | J. Pai | J. Shepherd | S. Cobbe | J. Shaffer | Ying Zhang | L. Best | R. Bettencourt | A. Tybjærg‐Hansen | B. Lamarche | P. Wennberg | E. Bladbjerg | J. Kauhanen | M. Walker | M. Alexander | L. Pennells | P. Gao | D. Wormser | B. Thórsson | Rory Collins | Torben Jørgensen | Pekka Jousilahti | Ben Schöttker | Benoît Lamarche | Ian Ford | Kennet Harald | Aaron R. Folsom | Maryam Kavousi | Eric B. Rimm | Jan-Håkan Jansson | Michele Robertson | Elizabeth Barrett-Connor | Y. Doi | Jacqueline M. Dekker | Coen D. A. Stehouwer | Stefan Kiechl | James Shepherd | Alun Evans | Dominique Arveiler | Lisa Pennells | Robert W. Tipping | S. Goya Wannamethee | John W. Yarnell | Russ Tracy | Amanda J. Lee | Heiko Müller | Patrik Wennberg | Hisatomi Arima | Yasufumi Doi | Toshiharu Ninomiya | Tom W. Meade | Jackie A. Cooper | Greg Grandits | Richard F. Gillum | Michael Mussolino | Sue E. Hankinson | Jennifer K. Pai | Susan Kirkland | Rudi G. Westendorp | Bernard Cantin | Deborah L. Wingard | Richele Bettencourt | Bolli Thorsson | Jacqueline C. Witteman | Barbara V. Howard | Lyle Best | Jason G. Umans | J. Michael Gaziano | Meir Stampfer | Astrid Fletcher | Martin Shipley | Julie Buring | Stuart M. Cobbe | Matthew Walker | Sarah Watson | Myriam Alexander | Emanuele Di Angelantonio | David Wormser | J. Umans | M. Robertson | B. Cantin | K. Harald | S. Goya Wannamethee | R. Tipping | S. Watson | J. Michael Gaziano | A. Evans | J. Coresh | H. Müller | A. Hofman | B. Psaty | J. Cooper | J. Price | Heiko Müller

[1]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[2]  D. Easton,et al.  Cancer incidence in BRCA1 mutation carriers , 2003 .

[3]  V. Beral,et al.  Breast cancer and abortion: collaborative reanalysis of data from 53 epidemiological studies, including 83 000 women with breast cancer from 16 countries , 2004, The Lancet.

[4]  E L Korn,et al.  Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. , 1997, American journal of epidemiology.

[5]  J. Benichou,et al.  Choice of time‐scale in Cox's model analysis of epidemiologic cohort data: a simulation study , 2004, Statistics in medicine.

[6]  S. Thompson,et al.  How should meta‐regression analyses be undertaken and interpreted? , 2002, Statistics in medicine.

[7]  Richard D Riley,et al.  Rejoinder to commentaries on ‘Multivariate meta‐analysis: Potential and promise’ , 2011 .

[8]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[9]  Dan Jackson,et al.  Multivariate meta-analysis: Potential and promise , 2011, Statistics in medicine.

[10]  A. Evans,et al.  Residual coronary risk in men aged 50–59 years treated for hypertension and hyperlipidaemia in the population: the PRIME study , 2004, Journal of hypertension.

[11]  P. Macfarlane,et al.  The design of a prospective study of Pravastatin in the Elderly at Risk (PROSPER). PROSPER Study Group. PROspective Study of Pravastatin in the Elderly at Risk. , 1999, The American journal of cardiology.

[12]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[13]  Holly Janes,et al.  Matching in Studies of Classification Accuracy: Implications for Analysis, Efficiency, and Assessment of Incremental Value , 2008, Biometrics.

[14]  Patrick Royston,et al.  Explained Variation for Survival Models , 2006 .

[15]  E. Riboli,et al.  Diet and cancer — the European Prospective Investigation into Cancer and Nutrition , 2004, Nature Reviews Cancer.

[16]  S R Lipsitz,et al.  A Global Goodness‐of‐Fit Statistic for Cox Regression Models , 1999, Biometrics.

[17]  Stephen Kaptoge,et al.  Statistical methods for the time-to-event analysis of individual participant data from multiple epidemiological studies , 2010, International journal of epidemiology.

[18]  M. Woodward,et al.  Comparison of risk prediction using the CKD-EPI equation and the MDRD study equation for estimated glomerular filtration rate. , 2012, JAMA.

[19]  Patrik Magnusson,et al.  American Journal of Epidemiology Practice of Epidemiology Risk Prediction Measures for Case-cohort and Nested Case-control Designs: an Application to Cardiovascular Disease , 2022 .

[20]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[21]  Vittorio Krogh,et al.  Methods for pooling results of epidemiologic studies: the Pooling Project of Prospective Studies of Diet and Cancer. , 2006, American journal of epidemiology.

[22]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[23]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[24]  A. Rodgers,et al.  Determinants of cardiovascualar disease in the Asia Pacific region: protocol for a collaborative overview of cohort studies , 1999 .

[25]  R. Newson Confidence Intervals for Rank Statistics: Somers’ D and Extensions , 2006 .

[26]  R. Collins,et al.  Blood cholesterol and vascular mortality by age, sex, and blood pressure: a meta-analysis of individual data from 61 prospective studies with 55 000 vascular deaths , 2007, The Lancet.

[27]  M. Pencina,et al.  On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , 2011, Statistics in medicine.

[28]  Alicja R. Rudnicka,et al.  Measures to assess the prognostic ability of the stratified Cox proportional hazards model , 2009, Statistics in medicine.

[29]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[30]  J. Danesh,et al.  Separate and combined associations of body-mass index and abdominal adiposity with cardiovascular disease: collaborative analysis of 58 prospective studies , 2011, The Lancet.

[31]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[32]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[33]  Michael J Pencina,et al.  Choice of time scale and its effect on significance of predictors in longitudinal studies , 2007, Statistics in medicine.

[34]  S. Kiechl,et al.  Carotid atherosclerosis and coronary heart disease in the metabolic syndrome: prospective data from the Bruneck study. , 2003, Diabetes care.

[35]  Richard D Riley,et al.  Interpretation of random effects meta-analyses , 2011, BMJ : British Medical Journal.

[36]  P. Elliott,et al.  The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. , 2008, International journal of epidemiology.

[37]  A. Uitterlinden,et al.  The Association between Common Vitamin D Receptor Gene Variations and Osteoporosis: A Participant-Level Meta-Analysis , 2006, Annals of Internal Medicine.

[38]  Ewout W Steyerberg,et al.  Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers , 2011, Statistics in medicine.

[39]  J. Gallacher,et al.  Lipid-related markers and cardiovascular disease prediction. , 2012, JAMA.

[40]  Gioacchino Leandro,et al.  Meta-Analysis in Medical Research: The Handbook for the Understanding and Practice of Meta-Analysis , 2005 .

[41]  S. Thompson,et al.  Quantifying heterogeneity in a meta‐analysis , 2002, Statistics in medicine.

[42]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[43]  Ian R. White,et al.  Multivariate Random-effects Meta-analysis , 2009 .

[44]  J. Gallacher,et al.  C-reactive protein, fibrinogen, and cardiovascular disease prediction. , 2012, The New England journal of medicine.

[45]  Patrick Royston,et al.  Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer , 2004, Statistics in medicine.

[46]  Patrick Royston,et al.  A new measure of prognostic separation in survival data , 2004, Statistics in medicine.

[47]  D.,et al.  Regression Models and Life-Tables , 2022 .

[48]  S J Pocock,et al.  Prognostic scores for detecting a high risk group: estimating the sensitivity when applied to new data. , 1990, Statistics in medicine.

[49]  H Tunstall-Pedoe,et al.  The Emerging Risk Factors Collaboration: analysis of individual data on lipid, inflammatory and other markers in over 1.1 million participants in 104 prospective studies of cardiovascular diseases , 2007, European Journal of Epidemiology.

[50]  K. Covinsky,et al.  Assessing the Generalizability of Prognostic Information , 1999, Annals of Internal Medicine.