Analysis of Partially Observed Clustered Data using Generalized Estimating Equations and Multiple Imputation

Clustered data arise in many settings, particularly within the social and biomedical sciences. For example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (for instance, parents and adolescents) to provide a holistic view of a subject's symptoms. Fitzmaurice et al. (1995, American Journal of Epidemiology 142: 1194–1203) have described estimation of multiple-source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data because additional stages of consent and assent are required. The usual GEE is unbiased when data are missing completely at random in the context of Little and Rubin (2002, Statistical Analysis with Missing Data [Wiley]). This is a strong assumption that may not be tenable. Other options, such as the weighted GEE, are computationally challenging when missingness is nonmonotone. Multiple imputation is an attractive method to fit incomplete data models while requiring only the less restrictive missing-at-random assumption. Previously, estimation of partially observed clustered data was computationally challenging. However, recent developments in Stata have facilitated using them in practice. We demonstrate how to use multiple imputation in conjunction with a GEE to investigate the prevalence of eating disorder symptoms in adolescents as reported by parents and adolescents and to determine the factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children, a cohort study that enrolled more than 14,000 pregnant mothers in 1991–92 and has followed the health and development of their children at regular intervals. While point estimates for the missing-at-random model were fairly similar to those for the GEE under missing completely at random, the missing-at-random model had smaller standard errors and required less stringent assumptions regarding missingness.

[1]  Kathryn M. Aloisio,et al.  Assessing eating disorder symptoms in adolescence: is there a role for multiple informants? , 2014, The International journal of eating disorders.

[2]  Frosso Motti-Stefanidi,et al.  JMASM 32: Multiple Imputation of Missing Multilevel, Longitudinal Data: A Case When Practical Considerations Trump Best Practices? , 2013 .

[3]  J. Guzmán Regression Models for Categorical Dependent Variables Using Stata , 2013 .

[4]  James M Robins,et al.  On weighting approaches for missing data , 2013, Statistical methods in medical research.

[5]  D. Lawlor,et al.  Cohort Profile: The ‘Children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children , 2012, International journal of epidemiology.

[6]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[7]  N. Horton,et al.  The Impact of Different Sources of Body Mass Index Assessment on Smoking Onset: An Application of Multiple-Source Information Models , 2011, The Stata journal.

[8]  G. F. Frank Liu,et al.  Comparisons of Methods for Analysis of Repeated Binary Responses with Missing Data , 2011, Journal of biopharmaceutical statistics.

[9]  Geert Molenberghs,et al.  Doubly Robust and Multiple-Imputation-Based Generalized Estimating Equations , 2011, Journal of biopharmaceutical statistics.

[10]  Robert Platt,et al.  Faculty Opinions recommendation of Multiple imputation using chained equations: Issues and guidance for practice. , 2011 .

[11]  Bongin Yoo The impact of dichotomization in longitudinal data analysis: a simulation study , 2010, Pharmaceutical statistics.

[12]  A. Legedza,et al.  An Overview of Practical Approaches for Handling Missing Data in Clinical Trials , 2009, Journal of biopharmaceutical statistics.

[13]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[14]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update of Ice , 2005 .

[15]  Nicholas J Horton,et al.  Regression analysis of multiple source and multiple informant data from complex survey samples , 2004, Statistics in medicine.

[16]  P. Royston Multiple Imputation of Missing Values , 2004 .

[17]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[18]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[19]  H. Meltzer,et al.  The British Child and Adolescent Mental Health Survey 1999: the prevalence of DSM-IV disorders. , 2003, Journal of the American Academy of Child and Adolescent Psychiatry.

[20]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[21]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[22]  N M Laird,et al.  Multiple informants: mortality associated with psychiatric disorders in the Stirling County Study. , 2001, American journal of epidemiology.

[23]  D. Kleinbaum,et al.  What can go wrong when you assume that correlated data are independent: an illustration from the evaluation of a childhood health intervention in Brazil , 2001, Statistics in medicine.

[24]  H. Meltzer,et al.  The Development and Well-Being Assessment: description and initial validation of an integrated assessment of child and adolescent psychopathology. , 2000, Journal of child psychology and psychiatry, and allied disciplines.

[25]  M. Paik,et al.  Generalized estimating equation model for binary outcomes with missing covariates. , 1997, Biometrics.

[26]  J L Collins,et al.  Youth risk behavior surveillance--United States, 1995. , 1996, The Journal of school health.

[27]  N M Laird,et al.  Bivariate logistic regression analysis of childhood psychopathology ratings using multiple informants. , 1995, American journal of epidemiology.

[28]  M. Kenward,et al.  Informative Drop‐Out in Longitudinal Data Analysis , 1994 .

[29]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[30]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[31]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[32]  StataCorp Stata multiple-imputation reference manual , 2011 .

[33]  Geert Molenberghs,et al.  A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data , 2008, Comput. Stat. Data Anal..

[34]  Orton,et al.  Multiple Imputation in Practice , 2001 .

[35]  M. Pembrey,et al.  ALSPAC--the Avon Longitudinal Study of Parents and Children. I. Study methodology. , 2001, Paediatric and perinatal epidemiology.

[36]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .