Methods for Observed-Cluster Inference When Cluster Size Is Informative: A Review and Clarifications

Clustered data commonly arise in epidemiology. We assume each cluster member has an outcome Y and covariates X . When there are missing data in Y, the distribution of Y given X in all cluster members (“complete clusters”) may be different from the distribution just in members with observed Y (“observed clusters”). Often the former is of interest, but when data are missing because in a fundamental sense Y does not exist (e.g., quality of life for a person who has died), the latter may be more meaningful (quality of life conditional on being alive). Weighted and doubly weighted generalized estimating equations and shared random‐effects models have been proposed for observed‐cluster inference when cluster size is informative, that is, the distribution of Y given X in observed clusters depends on observed cluster size. We show these methods can be seen as actually giving inference for complete clusters and may not also give observed‐cluster inference. This is true even if observed clusters are complete in themselves rather than being the observed part of larger complete clusters: here methods may describe imaginary complete clusters rather than the observed clusters. We show under which conditions shared random‐effects models proposed for observed‐cluster inference do actually describe members with observed Y. A psoriatic arthritis dataset is used to illustrate the danger of misinterpreting estimates from shared random‐effects models.

[1]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[2]  Pranab Kumar Sen,et al.  Within‐cluster resampling , 2001 .

[3]  Somnath Datta,et al.  Marginal Analyses of Clustered Data When Cluster Size Is Informative , 2003, Biometrics.

[4]  David B Dunson,et al.  A Bayesian Approach for Joint Modeling of Cluster Size and Subunit‐Specific Outcomes , 2003, Biometrics.

[5]  Thomas A. Louis,et al.  Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function , 2003 .

[6]  Carole Dufouil,et al.  Analysis of longitudinal studies with death and drop‐out: a case study , 2004, Statistics in medicine.

[7]  Ralitza V Gueorguieva,et al.  Comments about Joint Modeling of Cluster Size and Binary and Continuous Subunit‐Specific Outcomes , 2005, Biometrics.

[8]  J. N. K. Rao,et al.  Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes , 2005 .

[9]  S. Vansteelandt On Confounding, Prediction and Efficiency in the Analysis of Longitudinal and Cross‐sectional Clustered Data , 2007 .

[10]  B. Tom,et al.  A longitudinal study of the effect of disease activity and clinical damage on physical function over the course of psoriatic arthritis: Does the effect change over time? , 2007, Arthritis and rheumatism.

[11]  Geert Molenberghs,et al.  Shared‐Parameter Models , 2007 .

[12]  Michelle Shardell,et al.  Weighted estimating equations for longitudinal studies with death and non‐monotone missing time‐dependent covariates and outcomes , 2008, Statistics in medicine.

[13]  S. Albert Paul,et al.  Shared-parameter models , 2008 .

[14]  Brian D. M. Tom,et al.  Bias in 2-part mixed models for longitudinal semicontinuous data , 2009, Biostatistics.

[15]  Brenda F Kurland,et al.  Longitudinal Data with Follow-up Truncated by Death: Match the Analysis Method to Research Aims. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[16]  Charles E McCulloch,et al.  Estimation of covariate effects in generalized linear mixed models with informative cluster sizes. , 2011, Biometrika.

[17]  S. Lipsitz,et al.  Likelihood Methods for Binary Responses of Present Components in a Cluster , 2011, Biometrics.

[18]  B. Leroux,et al.  Informative Cluster Sizes for Subcluster‐Level Covariates and Weighted Generalized Estimating Equations , 2011, Biometrics.

[19]  Zhen Chen,et al.  A joint modeling approach to data with informative cluster size: Robustness to the cluster size model , 2011, Statistics in medicine.

[20]  Somnath Datta,et al.  Inference for marginal linear models for clustered longitudinal data with potentially informative cluster sizes , 2011, Statistical methods in medical research.

[21]  Dan Jackson,et al.  What Is Meant by "Missing at Random"? , 2013, 1306.2812.

[22]  B. Tom,et al.  A likelihood-based two-part marginal model for longitudinal semicontinuous data , 2015, Statistical methods in medical research.