Analysis of incomplete longitudinal binary data using multiple imputation

We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.

[1]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[2]  Moonseong Heo,et al.  Comparison of statistical methods for analysis of clustered binary observations , 2005, Statistics in medicine.

[3]  Paul J Rathouz,et al.  Performance of weighted estimating equations for longitudinal binary data with drop‐outs missing at random , 2002, Statistics in medicine.

[4]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[5]  Myunghee C. Paik,et al.  The generalized estimating equation approach when data are not missing completely at random , 1997 .

[6]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[7]  D. Rubin The Bayesian Bootstrap , 1981 .

[8]  C. Kastner,et al.  MAREG and WinMAREG A tool for marginal regression models , 1997 .

[9]  Myunghee C. Paik,et al.  Repeated measurement analysis for nonnormal data in small samples , 1988 .

[10]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[11]  D. Mehrotra,et al.  Minimum risk weights for comparing treatments in stratified binomial trials. , 2000, Statistics in medicine.

[12]  G Molenberghs,et al.  GEE with Gaussian Estimation of the Correlations When Data Are Incomplete , 2000, Biometrics.

[13]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[14]  C. Park,et al.  A Simple Method for Generating Correlated Binary Variates , 1996 .

[15]  M Nurminen,et al.  Comparative analysis of two rates. , 1985, Statistics in medicine.

[16]  H. Demirtas Assessment of Relative Improvement Due to Weights Within Generalized Estimating Equations Framework for Incomplete Clinical Trials Data , 2004, Journal of biopharmaceutical statistics.

[17]  P W Lavori,et al.  A multiple imputation strategy for clinical trials with truncation of patient data. , 1995, Statistics in medicine.

[18]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[19]  Geert Molenberghs,et al.  Regression Models for Longitudinal Binary Responses with Informative Drop‐Outs , 1995 .

[20]  N M Laird,et al.  Missing data in longitudinal studies. , 1988, Statistics in medicine.