Sieve Maximum Likelihood Estimation for Regression Models With Covariates Missing at Random

Missing covariates are common in regression problems. We propose a new semiparametric method based on a fully nonparametric distribution for the missing covariates that are assumed to be missing at random. The method of sieve maximum likelihood estimation is used to obtain the estimators of the regression coefficients. These estimators are shown to be consistent and asymptotically normal with their asymptotic covariance matrix that achieves the semiparametric efficiency bound. A bootstrap approach is used to estimate the asymptotic covariance matrix. Some practical modeling approaches for high-dimensional covariates are proposed. Extensive simulation studies are conducted to examine the finite-sample properties of the estimates, and a real data set from a liver cancer clinical trial is analyzed using the proposed method.

[1]  Likelihood approach for marginal proportional hazards regression in the presence of dependent censoring , 2005, math/0505604.

[2]  Patricia L. Smith Splines as a Useful and Convenient Statistical Tool , 1979 .

[3]  M. J. D. Powell,et al.  A tolerant algorithm for linearly constrained optimization calculations , 1989, Math. Program..

[4]  H. Y. Chen Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression , 2004 .

[5]  L P Zhao,et al.  Designs and analysis of two-stage studies. , 1992, Statistics in medicine.

[6]  J G Ibrahim,et al.  Maximum Likelihood Methods for Cure Rate Models with Missing Covariates , 2001, Biometrics.

[7]  Joseph G. Ibrahim,et al.  A Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood , 1999 .

[8]  Yi-Hau Chen,et al.  A Pseudoscore Estimator for Regression Problems With Two-Phase Sampling , 2003 .

[9]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[10]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[11]  S. Lipsitz,et al.  Hepatocellular Carcinoma: An ECOG Randomized Phase II Study of Beta‐Interferon and Menagoril , 1995, American journal of clinical oncology.

[12]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[13]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[14]  James M. Robins,et al.  Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates , 1995 .

[15]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[16]  Xiaotong Shen,et al.  On methods of sieves and penalization , 1997 .

[17]  J. Ibrahim,et al.  Semiparametric Models for Missing Covariate and Response Data in Regression Models , 2006, Biometrics.

[18]  M. J. D. Powell,et al.  A fast algorithm for nonlinearly constrained optimization calculations , 1978 .

[19]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[20]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[21]  R. Little,et al.  Proportional hazards regression with missing covariates , 1999 .

[22]  M. Pepe,et al.  Auxiliary covariate data in failure time regression , 1995 .

[23]  H. Y. Chen,et al.  Double-Semiparametric Method for Missing Covariates in Cox Regression Models , 2002 .

[24]  F. Scholz Maximum Likelihood Estimation , 2006 .

[25]  T. Smith,et al.  A Randomized Phase II Study of Acivicin and 4'Deoxydoxorubicin in Patients with Hepatocellular Carcinoma in an Eastern Cooperative Oncology Group Study , 1990, American journal of clinical oncology.

[26]  Alastair Scott,et al.  Maximum likelihood for generalised case-control studies , 2001 .

[27]  A. .,et al.  SEMIPARAMETRIC LIKELIHOOD RATIO INFERENCE , 1996 .

[28]  N. Saitou,et al.  Maximum likelihood methods. , 1990, Methods in enzymology.

[29]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[30]  J. Herod Introduction to Hilbert spaces with applications , 1990 .

[31]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[32]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[33]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[34]  Thomas R. Fleming,et al.  A Nonparametric Method for Dealing with Mismeasured Covariate Data , 1991 .

[35]  M. Paik Quasi-likelihood regression models with missing covariates , 1996 .

[36]  A. Scott,et al.  On the robustness of weighted methods for fitting models to case–control data , 2002 .

[37]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .