A case study on the choice, interpretation and checking of multilevel models for longitudinal binary outcomes.

Recent advances in statistical software have led to the rapid diffusion of new methods for modelling longitudinal data. Multilevel (also known as hierarchical or random effects) models for binary outcomes have generally been based on a logistic-normal specification, by analogy with earlier work for normally distributed data. The appropriate application and interpretation of these models remains somewhat unclear, especially when compared with the computationally more straightforward semiparametric or 'marginal' modelling (GEE) approaches. In this paper we pose two interrelated questions. First, what limits should be placed on the interpretation of the coefficients and inferences derived from random-effect models involving binary outcomes? Second, what diagnostic checks are appropriate for evaluating whether such random-effect models provide adequate fits to the data? We address these questions by means of an extended case study using data on adolescent smoking from a large cohort study. Bayesian estimation methods are used to fit a discrete-mixture alternative to the standard logistic-normal model, and posterior predictive checking is used to assess model fit. Surprising parallels in the parameter estimates from the logistic-normal and mixture models are described and used to question the interpretability of the so-called 'subject-specific' regression coefficients from the standard multilevel approach. Posterior predictive checks suggest a serious lack of fit of both multilevel models. The results do not provide final answers to the two questions posed, but we expect that lessons learned from the case study will provide general guidance for further investigation of these important issues.

[1]  Calyampudi Radhakrishna Rao,et al.  Multivariate analysis : future directions 2 , 1993 .

[2]  Bengt Muthén,et al.  General Longitudinal Modeling of Individual Differences in Experimental Designs: A Latent Variable Framework for Analysis and Power Estimation , 1997 .

[3]  J B Carlin,et al.  Analysis of binary outcomes in longitudinal studies using weighted estimating equations and discrete-time survival methods: prevalence and incidence of smoking in an adolescent cohort. , 1999, Statistics in medicine.

[4]  G. Patton,et al.  Computer administration of a school‐based adolescent health survey , 1996, Journal of paediatrics and child health.

[5]  D. Hedeker,et al.  MIXOR: a computer program for mixed-effects ordinal regression analysis. , 1996, Computer methods and programs in biomedicine.

[6]  Russell D. Wolfinger,et al.  Laplace's approximation for nonlinear mixed models. , 1993 .

[7]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[8]  J M Neuhaus,et al.  Statistical methods for longitudinal and clustered designs with binary responses , 1992, Statistical methods in medical research.

[9]  P. Heagerty Marginally Specified Logistic‐Normal Models for Longitudinal Binary Data , 1999, Biometrics.

[10]  D. Hedeker,et al.  Random-effects regression models for clustered data with an example from smoking prevention research. , 1994, Journal of consulting and clinical psychology.

[11]  H. Goldstein Nonlinear multilevel models, with an application to discrete response data , 1991 .

[12]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[13]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[14]  T. Lewis,et al.  Outliers in multilevel data , 1998 .

[15]  H. Engelhardt,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods.Anthony S. Bryk , Stephen W. Raudenbush , 1994 .

[16]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[17]  M. Lesperance,et al.  Estimation efficiency in a binary mixed-effects model setting , 1996 .

[18]  R. Wolfinger,et al.  Generalized linear mixed models a pseudo-likelihood approach , 1993 .

[19]  S L Zeger,et al.  On the use of concordant pairs in matched case-control studies. , 1988, Biometrics.

[20]  Bengt Muthén,et al.  Latent variable modeling of growth with missing data and multilevel data , 1993 .

[21]  Noreen Goldman,et al.  An assessment of estimation procedures for multilevel models with binary responses , 1995 .

[22]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[23]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[24]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[25]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[26]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[27]  Harvey Goldstein,et al.  Improved Approximations for Multilevel Models with Binary Responses , 1996 .

[28]  P. Diggle Analysis of Longitudinal Data , 1995 .

[29]  Scott L. Zeger,et al.  Lorelogram: A Regression Approach to Exploring Dependence in Longitudinal Categorical Responses , 1998 .

[30]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[31]  F B Hu,et al.  Comparison of population-averaged and subject-specific approaches for analyzing repeated binary outcomes. , 1998, American journal of epidemiology.

[32]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[33]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[34]  J. Hodges Some algebra and geometry for hierarchical models, applied to diagnostics , 1998 .

[35]  J B Carlin,et al.  The course of early smoking: a population-based cohort study over three years. , 1998, Addiction.

[36]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .