Missing data in clinical studies: issues and methods.

Missing data are a prevailing problem in any type of data analyses. A participant variable is considered missing if the value of the variable (outcome or covariate) for the participant is not observed. In this article, various issues in analyzing studies with missing data are discussed. Particularly, we focus on missing response and/or covariate data for studies with discrete, continuous, or time-to-event end points in which generalized linear models, models for longitudinal data such as generalized linear mixed effects models, or Cox regression models are used. We discuss various classifications of missing data that may arise in a study and demonstrate in several situations that the commonly used method of throwing out all participants with any missing data may lead to incorrect results and conclusions. The methods described are applied to data from an Eastern Cooperative Oncology Group phase II clinical trial of liver cancer and a phase III clinical trial of advanced non-small-cell lung cancer. Although the main area of application discussed here is cancer, the issues and methods we discuss apply to any type of study.

[1]  T. Smith,et al.  A Randomized Phase II Study of Acivicin and 4'Deoxydoxorubicin in Patients with Hepatocellular Carcinoma in an Eastern Cooperative Oncology Group Study , 1990, American journal of clinical oncology.

[2]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[3]  J. Kirkwood,et al.  Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: the Eastern Cooperative Oncology Group Trial EST 1684. , 1996, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Haitao Chu,et al.  On estimation of vaccine efficacy using validation samples with selection bias. , 2006, Biostatistics.

[5]  Joseph G. Ibrahim,et al.  Bayesian methods for generalized linear models with covariates missing at random , 2002 .

[6]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[7]  Thomas A Louis,et al.  Random Effects Models in a Meta-Analysis of the Accuracy of Two Diagnostic Tests Without a Gold Standard , 2009, Journal of the American Statistical Association.

[8]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[9]  M. H. Chen,et al.  Non-ignorable missing covariates in generalized linear models. , 1999, Statistics in medicine.

[10]  R. Irizarry,et al.  Generalized Additive Selection Models for the Analysis of Studies with Potentially Nonignorable Missing Outcome Data , 2003, Biometrics.

[11]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[12]  J G Ibrahim,et al.  Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable. , 1996, Biometrics.

[13]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[14]  V. Sondak,et al.  High- and low-dose interferon alfa-2b in high-risk melanoma: first analysis of intergroup trial E1690/S9111/C9190. , 2000, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  A simple method for analyzing data from a randomized trial with a missing binary outcome , 2003, BMC medical research methodology.

[16]  J. Ibrahim,et al.  Likelihood-Based Methods for Missing Covariates in the Cox Proportional Hazards Model , 2001 .

[17]  Werner Vach,et al.  Logistic Regression with Missing Values in the Covariates , 1994 .

[18]  Laurence S Freedman,et al.  Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative covariates. , 2005, Biostatistics.

[19]  J G Ibrahim,et al.  Estimating equations with incomplete categorical covariates in the Cox model. , 1998, Biometrics.

[20]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[21]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[22]  S. Fienberg,et al.  The Analysis of Contingency Tables with Incompletely Classified Data , 1976 .

[23]  Joseph G. Ibrahim,et al.  Using auxiliary data for parameter estimation with non‐ignorably missing outcomes , 2001 .

[24]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[25]  C. Fuchs Maximum Likelihood Estimation and Model Selection in Contingency Tables with Missing Data , 1982 .

[26]  Roderick J. A. Little,et al.  A test of missing completely at random for generalised estimating equations with missing data , 1999 .

[27]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[28]  Søren Feodor Nielsen,et al.  Proper and Improper Multiple Imputation , 2003 .

[29]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[30]  Joseph G. Ibrahim,et al.  Missing data methods in longitudinal studies: a review , 2009 .

[31]  J G Ibrahim,et al.  Using the EM-algorithm for survival data with incomplete categorical covariates , 1996, Lifetime data analysis.

[32]  S. Lipsitz,et al.  Hepatocellular Carcinoma: An ECOG Randomized Phase II Study of Beta‐Interferon and Menagoril , 1995, American journal of clinical oncology.

[33]  J. Robins,et al.  Semiparametric regression estimation in the presence of dependent censoring , 1995 .

[34]  Joseph G. Ibrahim,et al.  A Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood , 1999 .

[35]  S. Lipsitz,et al.  Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable , 2001 .

[36]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[37]  Michael J Schell,et al.  Phase III trial comparing a defined duration of therapy versus continuous therapy followed by second-line therapy in advanced-stage IIIB/IV non-small-cell lung cancer. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[38]  Joseph G. Ibrahim,et al.  A conditional model for incomplete covariates in parametric regression models , 1996 .

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[41]  R. Little,et al.  Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[42]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[43]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[44]  J. Robins,et al.  Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. , 1997, Statistics in medicine.

[45]  Ian R White,et al.  Adjusting for partially missing baseline measurements in randomized trials , 2005, Statistics in medicine.

[46]  H. Chu,et al.  Estimating vaccine efficacy using auxiliary outcome data and a small validation sample , 2004, Statistics in medicine.

[47]  Joseph G. Ibrahim,et al.  Maximum likelihood inference for the Cox regression model with applications to missing covariates , 2009, J. Multivar. Anal..