Theory and Inference for Regression Models with Missing Responses and Covariates.

In this paper, we carry out an in-depth theoretical investigation for inference with missing response and covariate data for general regression models. We assume that the missing data are Missing at Random (MAR) or Missing Completely at Random (MCAR) throughout. Previous theoretical investigations in the literature have focused only on missing covariates or missing responses, but not both. Here, we consider theoretical properties of the estimates under three different estimation settings: complete case analysis (CC), a complete response analysis (CR) that involves an analysis of those subjects with only completely observed responses, and the all case analysis (AC), which is an analysis based on all of the cases. Under each scenario, we derive general expressions for the likelihood and devise estimation schemes based on the EM algorithm. We carry out a theoretical investigation of the three estimation methods in the normal linear model and analytically characterize the loss of information for each method, as well as derive and compare the asymptotic variances for each method assuming the missing data are MAR or MCAR. In addition, a theoretical investigation of bias for the CC method is also carried out. A simulation study and real dataset are given to illustrate the methodology.

[1]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[2]  LIKELIHOOD-BASED INFERENCE WITH NONIGNORABLE MISSING RESPONSES AND COVARIATES IN MODELS FOR DISCRETE LONGITUDINAL DATA , 2006 .

[3]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[4]  N M Laird,et al.  Maximum likelihood analysis of generalized linear models with missing covariates , 1999, Statistical methods in medical research.

[5]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[6]  S. Lipsitz,et al.  Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable , 2001 .

[7]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[8]  Joseph G. Ibrahim,et al.  A Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood , 1999 .

[9]  J. Ibrahim,et al.  Semiparametric Models for Missing Covariate and Response Data in Regression Models , 2006, Biometrics.

[10]  Nan M. Laird,et al.  Regression Analysis for Categorical Variables with Outcome Subject to Nonignorable Nonresponse , 1988 .

[11]  J. Ibrahim,et al.  Likelihood-Based Methods for Missing Covariates in the Cox Proportional Hazards Model , 2001 .

[12]  J G Ibrahim,et al.  Monte Carlo EM for Missing Covariates in Parametric Regression Models , 1999, Biometrics.

[13]  Joseph G Ibrahim,et al.  Maximum Likelihood Methods for Nonignorable Missing Responses and Covariates in Random Effects Models , 2003, Biometrics.

[14]  Joseph G. Ibrahim,et al.  Propriety of the Posterior Distribution and Existence of the MLE for Regression Models With Covariates Missing at Random , 2004 .

[15]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[16]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[17]  Roderick J. A. Little,et al.  Analysis of multivariate missing data with nonignorable nonresponse , 2003 .

[18]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[19]  Michael J Schell,et al.  Phase III trial comparing a defined duration of therapy versus continuous therapy followed by second-line therapy in advanced-stage IIIB/IV non-small-cell lung cancer. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  Joseph G. Ibrahim,et al.  On propriety of the posterior distribution and existence of the maximum likelihood estimator for regression models with covariates missing at random , 2004 .