An R package for model fitting, model selection and the simulation for longitudinal data with dropout missingness

Abstract Missing data arise frequently in clinical and epidemiological fields, in particular in longitudinal studies. This paper describes the core features of an R package wgeesel, which implements marginal model fitting (i.e., weighted generalized estimating equations, WGEE; doubly robust GEE) for longitudinal data with dropouts under the assumption of missing at random. More importantly, this package comprehensively provide existing information criteria for WGEE model selection on marginal mean or correlation structures. Also, it can serve as a valuable tool for simulating longitudinal data with missing outcomes. Lastly, a real data example and simulations are presented to illustrate and validate our package.

[1]  L. Kong,et al.  Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples , 2016, Statistics in medicine.

[2]  Donald Hedeker,et al.  Application of Random-Effects Probit Regression Models , 1994 .

[3]  Rui Wang,et al.  Accounting for interactions and complex inter‐subject dependency in estimating treatment effect in cluster‐randomized trials with missing outcomes , 2015, Biometrics.

[4]  Rui Wang,et al.  CRTgeeDR: an R Package for Doubly Robust Generalized Estimating Equations Estimations in Cluster Randomized Trials with Missing Data , 2017, R J..

[5]  Chung-Wei Shen,et al.  Model selection of generalized estimating equations with multiply imputed longitudinal data , 2013, Biometrical journal. Biometrische Zeitschrift.

[6]  Anup Amatya,et al.  PoisNor: An R package for generation of multivariate data with Poisson and normal marginals , 2017, Commun. Stat. Simul. Comput..

[7]  P. McCullagh,et al.  Monograph on Statistics and Applied Probability , 1989 .

[8]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[9]  Enrico A. Colosimo,et al.  Doubly Robust-Based Generalized Estimating Equations for the Analysis of Longitudinal Ordinal Missing Data , 2015 .

[10]  Robert N. Rodriguez,et al.  Weighted Methods for Analyzing Missing Data with the GEE Procedure , 2015 .

[11]  Ming Wang,et al.  Generalized Estimating Equations in Longitudinal Data Analysis: A Review and Recent Developments , 2014 .

[12]  X. Luna,et al.  CovSel: An R Package for Covariate Selection When Estimating Average Causal Effects , 2015 .

[13]  C. Mallows More comments on C p , 1995 .

[14]  Geert Molenberghs,et al.  GEE for longitudinal ordinal data: Comparing R-geepack, R-multgee, R-repolr, SAS-GENMOD, SPSS-GENLIN , 2014, Comput. Stat. Data Anal..

[15]  Qi Long,et al.  Modified robust variance estimator for generalized estimating equations with improved small‐sample performance , 2011, Statistics in medicine.

[16]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[17]  W. Pan Akaike's Information Criterion in Generalized Estimating Equations , 2001, Biometrics.

[18]  D. Hedeker,et al.  Application of random-effects probit regression models. , 1994, Journal of consulting and clinical psychology.

[19]  Joseph G. Ibrahim,et al.  Missing data methods in longitudinal studies: a review , 2009 .

[20]  Stephen R Cole,et al.  An information criterion for marginal structural models , 2013, Statistics in medicine.

[21]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[22]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[23]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[24]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[25]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[26]  C. L. Mallows Some comments on C_p , 1973 .

[27]  Geert Molenberghs,et al.  Doubly Robust and Multiple-Imputation-Based Generalized Estimating Equations , 2011, Journal of biopharmaceutical statistics.

[28]  Kurt Hornik,et al.  On the generation of correlated artificial binary data , 1998 .

[29]  Paul J Rathouz,et al.  Performance of weighted estimating equations for longitudinal binary data with drop‐outs missing at random , 2002, Statistics in medicine.

[30]  Liqiu Jiang,et al.  Multiple Imputation Approaches for the Analysis of Dichotomized Responses in Longitudinal Studies with Missing Data , 2010, Biometrics.

[31]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[32]  H. Akaike A new look at the statistical model identification , 1974 .

[33]  N. Jewell,et al.  Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data , 1990 .

[34]  Chung-Wei Shen,et al.  Model Selection for Generalized Estimating Equations Accommodating Dropout Missingness , 2012, Biometrics.

[35]  Hakan Demirtas,et al.  Simultaneous Generation of Binary and Normal Data with Specified Marginal and Association Structures , 2012, Journal of biopharmaceutical statistics.

[36]  María Dueñas,et al.  Simple generalized estimating equations (GEEs) and weighted generalized estimating equations (WGEEs) in longitudinal studies with dropouts: guidelines and implementation in R , 2016, Statistics in medicine.

[37]  Geert Molenberghs,et al.  A SAS Program Combining R Functionalities to Implement Pattern-Mixture Models , 2015 .

[38]  Masahiko Gosho,et al.  Model selection in the weighted generalized estimating equations for longitudinal data with dropout , 2016, Biometrical journal. Biometrische Zeitschrift.

[39]  M Chavance,et al.  Sensitivity analysis of incomplete longitudinal data departing from the missing at random assumption: Methodology and application in a clinical trial with drop-outs , 2016, Statistical methods in medical research.

[40]  Guoqi Qian,et al.  Selection of Working Correlation Structure and Best Model in GEE Analyses of Longitudinal Data , 2007, Commun. Stat. Simul. Comput..

[41]  Eric J Tchetgen Tchetgen,et al.  Augmented generalized estimating equations for improving efficiency and validity of estimation in cluster randomized trials by leveraging cluster‐level and individual‐level covariates , 2012, Statistics in medicine.

[42]  Martin Crowder,et al.  On the use of a working correlation matrix in using generalised linear models for repeated measures , 1995 .

[43]  M. Ghahramani Journal of Modern Applied Statistical Methods the Information Criterion the Information Criterion , 2022 .

[44]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[45]  J. Ware,et al.  Applied Longitudinal Analysis , 2004 .

[46]  Leon Jay Gleser Accounting for Interactions , 1992 .

[47]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[48]  K. Barton MuMIn : multi-model inference, R package version 0.12.0 , 2009 .

[49]  Xiao-Hua Zhou,et al.  Doubly Robust Estimates for Binary Longitudinal Data Analysis with Missing Response and Missing Covariates , 2011, Biometrics.

[50]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[51]  Andrew Copas,et al.  Doubly robust generalized estimating equations for longitudinal data , 2009, Statistics in medicine.

[52]  N M Laird,et al.  Missing data in longitudinal studies. , 1988, Statistics in medicine.