Prediction models for clustered data: comparison of a random intercept and standard regression model

BackgroundWhen study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions.MethodsUsing an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated.ResultsThe model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept.ConclusionThe models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.

[1]  Ivy Liu The Analysis of Ordered Categorical Data : An Overview and a Survey of Recent Developments , 2005 .

[2]  Cor J Kalkman,et al.  Does Measurement of Preoperative Anxiety Have Added Value for Predicting Postoperative Nausea and Vomiting? , 2005, Anesthesia and analgesia.

[3]  S Senn,et al.  Some controversies in planning and analysing multi-centre trials. , 1998, Statistics in medicine.

[4]  N Roewer,et al.  A simplified risk score for predicting postoperative nausea and vomiting: conclusions from cross-validations between two centers. , 1999, Anesthesiology.

[5]  W. Marcenes,et al.  The independent contribution of neighborhood disadvantage and individual-level socioeconomic position to self-reported oral health: a multilevel analysis. , 2007, Community dentistry and oral epidemiology.

[6]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[7]  Abigail Shefer,et al.  Public health application comparing multilevel analysis with logistic regression: immunization coverage among long-term care facility residents. , 2005, Annals of epidemiology.

[8]  Mirjam Moerbeek,et al.  A comparison between traditional methods and multilevel regression for the analysis of multicenter intervention studies. , 2003, Journal of clinical epidemiology.

[9]  Joop J. Hox,et al.  Applied Multilevel Analysis. , 1995 .

[10]  Daniel Courgeau,et al.  Goldstein (H.) — Multilevel Statistical Models , 1997 .

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  Yvonne Vergouwe,et al.  External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. , 2010, American journal of epidemiology.

[13]  Harvey Goldstein,et al.  Multilevel modelling of medical data , 2002, Statistics in medicine.

[14]  Guang Guo,et al.  Multilevel Modeling for Binary Data , 2000 .

[15]  A. Scott,et al.  The Effect of Two-Stage Sampling on Ordinary Least Squares Methods , 1982 .

[16]  H C Van Houwelingen,et al.  Construction, validation and updating of a prognostic model for kidney graft survival. , 1995, Statistics in medicine.

[17]  Jean Bouyer,et al.  Choosing marginal or random-effects models for longitudinal binary responses: application to self-reported disability among older persons , 2002, BMC medical research methodology.

[18]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[19]  Elena Losina,et al.  An introduction to hierarchical linear modelling , 1999 .

[20]  W. Bouwmeester Prediction models: systematic reviews and clustered study data , 2012 .

[21]  L. Sullivan,et al.  Tutorial in biostatistics. An introduction to hierarchical linear modelling. , 1999, Statistics in medicine.

[22]  E. Lesaffre,et al.  An application of Harrell's C‐index to PH frailty models , 2010, Statistics in medicine.

[23]  S. Rabe-Hesketh,et al.  Prediction in multilevel generalized linear models , 2009 .

[24]  H. Goldstein Multilevel Statistical Models , 2006 .

[25]  D. Hedeker,et al.  Random regression models for multicenter clinical trials data. , 1991, Psychopharmacology bulletin.

[26]  T. Alonzo Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating By Ewout W. Steyerberg , 2009 .

[27]  Yvonne Vergouwe,et al.  Adaptation of Clinical Prediction Models for Application in Local Settings , 2012, Medical decision making : an international journal of the Society for Medical Decision Making.

[28]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[29]  E. Steyerberg,et al.  Reporting and Methods in Clinical Prediction Research: A Systematic Review , 2012, PLoS medicine.

[30]  G W Comstock,et al.  Neighborhood environments and coronary heart disease: a multilevel analysis. , 1997, American journal of epidemiology.