Non‐ignorable missing covariate data in survival analysis: a case‐study of an International Breast Cancer Study Group trial

Non-ignorable missing data, a serious problem in both clinical trials and observational studies, can lead to biased inferences. Quality-of-life measures have become increasingly popular in clinical trials. However, these measures are often incompletely observed, and investigators may suspect that missing quality-of-life data are likely to be non-ignorable. Although several recent references have addressed missing covariates in survival analysis, they all required the assumption that missingness is at random or that all covariates are discrete. We present a method for estimating the parameters in the Cox proportional hazards model when missing covariates may be non-ignorable and continuous or discrete. Our method is useful in reducing the bias and improving efficiency in the presence of missing data. The methodology clearly specifies assumptions about the missing data mechanism and, through sensitivity analysis, helps investigators to understand the potential effect of missing data on study results. Copyright 2004 Royal Statistical Society.

[1]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[2]  D. Cella,et al.  Comparison of several model-based methods for analysing incomplete quality of life data in cancer clinical trials. , 1998, Statistics in medicine.

[3]  G. Molenberghs,et al.  Linear Mixed Models for Longitudinal Data , 2001 .

[4]  B. Reboussin,et al.  Mixed Effects Logistic Regression Models for Longitudinal Ordinal Functional Response Data with Multiple‐Cause Drop‐Out from the Longitudinal Study of Aging , 2000, Biometrics.

[5]  J G Ibrahim,et al.  Monte Carlo EM for Missing Covariates in Parametric Regression Models , 1999, Biometrics.

[6]  J. Ibrahim,et al.  Likelihood-Based Methods for Missing Covariates in the Cox Proportional Hazards Model , 2001 .

[7]  M. Pepe,et al.  Auxiliary covariate data in failure time regression , 1995 .

[8]  Joseph G. Ibrahim,et al.  A conditional model for incomplete covariates in parametric regression models , 1996 .

[9]  J. Forbes,et al.  Impact of adjuvant therapy on quality of life in women with node-positive operable breast cancer , 1996, The Lancet.

[10]  Niels Keiding,et al.  Statistical Models Based on Counting Processes , 1993 .

[11]  Joseph G. Ibrahim,et al.  Bayesian Survival Analysis , 2004 .

[12]  M. Bonetti,et al.  A Method-of-Moments Estimation Procedure for Categorical Quality-of-Life Data with Nonignorable Missingness , 1999 .

[13]  S. MacEachern,et al.  Bayesian variable selection for proportional hazards models , 1999 .

[14]  M. Zelen,et al.  Effectiveness of adjuvant chemotherapy in combination with tamoxifen for node-positive postmenopausal breast cancer patients. , 1997, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  W. Tsai,et al.  On using the Cox proportional hazards model with missing covariates , 1997 .

[16]  Michael I. Miller,et al.  Latent class models for longitudinal studies of the elderly with data missing at random , 2002 .

[17]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[18]  Joseph G. Ibrahim,et al.  Incomplete covariates in the Cox model with applications to biological marker data , 2001 .

[19]  Z. Ying,et al.  Cox Regression with Incomplete Covariate Measurements , 1993 .

[20]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[21]  J G Ibrahim,et al.  Estimating equations with incomplete categorical covariates in the Cox model. , 1998, Biometrics.

[22]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  N. Breslow Covariance analysis of censored survival data. , 1974, Biometrics.

[24]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[25]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[26]  R. Little,et al.  Proportional hazards regression with missing covariates , 1999 .

[27]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[28]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[29]  Torben Martinussen,et al.  Cox Regression with Incomplete Covariate Measurements using the EM‐algorithm , 1999 .

[30]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[31]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[32]  Myunghee Cho Paik Multiple Imputation for the Cox Proportional Hazards Model with Missing Covariates , 1997, Lifetime data analysis.

[33]  L Ryan,et al.  Semiparametric Regression Analysis of Interval‐Censored Data , 2000, Biometrics.

[34]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[35]  Michael E. Miller,et al.  A Marginal Model for Analyzing Discrete Outcomes From Longitudinal Surveys With Outcomes Subject to Multiple-Cause Nonresponse , 2001 .

[36]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .