Estimation of covariate effects in generalized linear mixed models with informative cluster sizes.

In standard regression analyses of clustered data, one typically assumes that the expected value of the response is independent of cluster size. However, this is often false. For example, in studies of surgical interventions, investigators have frequently found surgery volume and outcomes to be related to the skill level of the surgeons. This paper examines the effect of ignoring response-dependent, informative, cluster sizes on standard analytical methods such as mixed-effects models and conditional likelihood methods using analytic calculations, simulation studies and an example from a study of periodontal disease. We consider the case in which cluster sizes and responses share random effects which we assume to be independent of the covariates. Our focus is on maximum likelihood methods that ignore informative cluster sizes, and we show that they exhibit little bias in estimating covariate effects that are uncorrelated with the random effects associated with cluster sizes. However, estimation of covariate effects that are associated with the random effects can be biased. In particular, for models with random intercepts only, ignoring informative cluster sizes can yield biased estimators of the intercept but little bias in estimation of all covariate effects.

[1]  C. McCulloch,et al.  Generalized Linear Mixed Models , 2005 .

[2]  Pranab Kumar Sen,et al.  Within‐cluster resampling , 2001 .

[3]  A Milstein,et al.  Selective referral to high-volume hospitals: estimating potentially avoidable deaths. , 2000, JAMA.

[4]  S. R. Searle,et al.  Generalized, Linear, and Mixed Models , 2005 .

[5]  Ralitza V Gueorguieva,et al.  Comments about Joint Modeling of Cluster Size and Binary and Continuous Subunit‐Specific Outcomes , 2005, Biometrics.

[6]  Tony Lancaster,et al.  Orthogonal Parameters and Panel Data , 2002 .

[7]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  J M Neuhaus,et al.  Assessing change with longitudinal and clustered binary data. , 2001, Annual review of public health.

[10]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[11]  J. Birkmeyer High-risk surgery--follow the crowd. , 2000, JAMA.

[12]  P. Heagerty,et al.  Misspecified maximum likelihood estimates and generalised linear mixed models , 2001 .

[13]  Charles E. McCulloch,et al.  Separating between‐ and within‐cluster covariate effects by using conditional and partitioning methods , 2006 .

[14]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[15]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[16]  J. N. K. Rao,et al.  Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes , 2005 .

[17]  David B Dunson,et al.  A Bayesian Approach for Joint Modeling of Cluster Size and Subunit‐Specific Outcomes , 2003, Biometrics.

[18]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[19]  Somnath Datta,et al.  Marginal Analyses of Clustered Data When Cluster Size Is Informative , 2003, Biometrics.

[20]  J. Kalbfleisch,et al.  The effects of mixture distribution misspecification when fitting mixed-effects logistic models , 1992 .