Likelihood Methods for Regression Models with Expensive Variables Missing by Design

In some applications involving regression the values of certain variables are missing by design for some individuals. For example, in two-stage studies (Zhao and Lipsitz, 1992), data on "cheaper" variables are collected on a random sample of individuals in stage I, and then "expensive" variables are measured for a subsample of these in stage II. So the "expensive" variables are missing by design at stage I. Both estimating function and likelihood methods have been proposed for cases where either covariates or responses are missing. We extend the semiparametric maximum likelihood (SPML) method for missing covariate problems (e.g. Chen, 2004; Ibrahim et al., 2005; Zhang and Rockette, 2005, 2007) to deal with more general cases where covariates and/or responses are missing by design, and show that profile likelihood ratio tests and interval estimation are easily implemented. Simulation studies are provided to examine the performance of the likelihood methods and to compare their efficiencies with estimating function methods for problems involving (a) a missing covariate and (b) a missing response variable. We illustrate the ease of implementation of SPML and demonstrate its high efficiency.

[1]  J G Ibrahim,et al.  Monte Carlo EM for Missing Covariates in Parametric Regression Models , 1999, Biometrics.

[2]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[3]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[4]  Norman E. Breslow,et al.  Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling , 2003 .

[5]  David J. Spiegelhalter,et al.  Analysis of longitudinal binary data from multiphase sampling , 1998 .

[6]  H. Rockette,et al.  On maximum likelihood estimation in parametric regression with missing covariates , 2005 .

[7]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[8]  H. Y. Chen Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression , 2004 .

[9]  James M. Robins,et al.  Efficient estimation of regression parameters from multistage studies with validation of outcome and covariates , 1997 .

[10]  Yi-Hau Chen,et al.  A Pseudoscore Estimator for Regression Problems With Two-Phase Sampling , 2003 .

[11]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[12]  Consistency of Semiparametric Maximum Likelihood Estimators for Two-Phase , Outcome Dependent Sampling , 2000 .

[13]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[14]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[15]  R. Little,et al.  Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[16]  Susan A. Murphy,et al.  Observed information in semi-parametric models , 1999 .

[17]  Howard E. Rockette,et al.  An EM algorithm for regression analysis with incomplete covariate information , 2007 .

[18]  Norman E. Breslow,et al.  Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model , 2004 .

[19]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[20]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[21]  Consistency of semiparametric maximum likelihood estimators for two‐phase sampling , 2001 .

[22]  J Halpern,et al.  Multi-stage sampling in genetic epidemiology. , 1997, Statistics in medicine.

[23]  Joanna Elizabeth. Mills,et al.  The analysis longitudinal binary data. , 2000 .

[24]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[25]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[26]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[27]  L P Zhao,et al.  Designs and analysis of two-stage studies. , 1992, Statistics in medicine.

[28]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[29]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[30]  D. McLeish,et al.  Estimation of regression parameters in missing data problems , 2006 .