A robust pairwise likelihood method for incomplete longitudinal binary data arising in clusters

Clustered longitudinal data feature cross‐sectional associations within clusters, serial dependence within subjects, and associations between responses at different time points from different subjects within the same cluster. Generalized estimating equations are often used for inference with data of this sort since they do not require full specification of the response model. When data are incomplete, however, they require data to be missing completely at random unless inverse probability weights are introduced based on a model for the missing data process. The authors propose a robust approach for incomplete clustered longitudinal data using composite likelihood. Specifically, pairwise likelihood methods are described for conducting robust estimation with minimal model assumptions made. The authors also show that the resulting estimates remain valid for a wide variety of missing data problems including missing at random mechanisms and so in such cases there is no need to model the missing data process. In addition to describing the asymptotic properties of the resulting estimators, it is shown that the method performs well empirically through simulation studies for complete and incomplete data. Pairwise likelihood estimators are also compared with estimators obtained from inverse probability weighted alternating logistic regression. An application to data from the Waterloo Smoking Prevention Project is provided for illustration. The Canadian Journal of Statistics 39: 34–51; 2011 © 2010 Statistical Society of Canada

[1]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[2]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[3]  R. Prentice,et al.  Correlated binary regression with covariates specific to each binary observation. , 1988, Biometrics.

[4]  M. Edwardes,et al.  A randomized trial to evaluate the risk of gastrointestinal disease due to consumption of drinking water meeting current microbiological standards. , 1991, American journal of public health.

[5]  S. Lipsitz,et al.  Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association , 1991 .

[6]  S. Kelder,et al.  Communitywide smoking prevention: long-term outcomes of the Minnesota Heart Health Program and the Class of 1989 Study. , 1992, American journal of public health.

[7]  G. Molenberghs,et al.  Marginal modelling of Correlated Ordinal Data using an n-way Plackett Distribution , 1992 .

[8]  P. Diggle,et al.  Modelling multivariate binary data with alternating logistic regressions , 1993 .

[9]  N. Laird,et al.  A likelihood-based method for analysing longitudinal binary responses , 1993 .

[10]  M. Kenward,et al.  Informative Drop‐Out in Longitudinal Data Analysis , 1994 .

[11]  G. Molenberghs,et al.  Marginal Modeling of Correlated Ordinal Data Using a Multivariate Plackett Distribution , 1994 .

[12]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[13]  M. Kenward,et al.  Informative dropout in longitudinal data analysis (with discussion) , 1994 .

[14]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[15]  Stuart R. Lipsitz,et al.  Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data , 1998 .

[16]  S. Lele,et al.  A Composite Likelihood Approach to Binary Spatial Data , 1998 .

[17]  B. Everitt,et al.  Analysis of longitudinal data , 1998, British Journal of Psychiatry.

[18]  K. Brown,et al.  Effectiveness of a social influences smoking prevention program as a function of provider type, training method, and school risk. , 1999, American journal of public health.

[19]  P. Heagerty Marginally Specified Logistic‐Normal Models for Longitudinal Binary Data , 1999, Biometrics.

[20]  David J. Nott,et al.  A pairwise likelihood approach to analyzing correlated binary data , 2000 .

[21]  Richard J. Cook,et al.  Marginal Methods for Incomplete Longitudinal Data Arising in Clusters , 2002 .

[22]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[23]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[24]  Richard J Cook,et al.  Marginal Analysis of Incomplete Longitudinal Binary Data: A Cautionary Note on LOCF Imputation , 2004, Biometrics.

[25]  D. Miglioretti,et al.  Marginal modeling of multilevel binary data with time-varying covariates. , 2004, Biostatistics.

[26]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[27]  G. Yi,et al.  Marginal and association regression models for longitudinal binary data with drop‐outs: A likelihood‐based approach , 2005 .

[28]  H. Joe,et al.  Composite likelihood estimation in multivariate data analysis , 2005 .

[29]  Joseph G Ibrahim,et al.  Pseudo‐likelihood methods for longitudinal binary data with non‐ignorable missing responses and covariates , 2006, Statistics in medicine.

[30]  Geert Verbeke,et al.  Pairwise Fitting of Mixed Models for the Joint Modeling of Multivariate Longitudinal Profiles , 2006, Biometrics.

[31]  C. Genest,et al.  A Primer on Copulas for Count Data , 2007, ASTIN Bulletin.

[32]  D. Dey,et al.  Flexible generalized t-link models for binary response data , 2008 .

[33]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .