Weighted Semiparametric Estimation in Regression Analysis with Missing Covariate Data

Abstract This article investigates estimation of the regression coefficients in an assumed mean function when covariates on some subjects are missing. We examine the performance of a Horvitz and Thompson (1952)-type weighted estimator by using different estimates of the selection probabilities, which may be treated as nuisance parameters (or a nuisance function). In particular, we investigate the properties of the estimate of the regression parameters when the selection probabilities are estimated by kernel smoothers. We present large sample theory for the new estimator and conduct simulation studies comparing the proposed estimator to the maximum likelihood estimator and multiple imputation under various model assumptions and different missingness mechanisms. In addition, we provide two real examples that motivate this investigation.

[1]  L P Zhao,et al.  Designs and analysis of two-stage studies. , 1992, Statistics in medicine.

[2]  A. Kristal,et al.  Obesity, alcohol, and tobacco as risk factors for cancers of the esophagus and gastric cardia: adenocarcinoma versus squamous cell carcinoma. , 1995, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[3]  W. J. Hall,et al.  Information and Asymptotic Efficiency in Parametric-Nonparametric Models , 1983 .

[4]  J M Taylor,et al.  Estimating the distribution of times from HIV seroconversion to AIDS using multiple imputation. Multicentre AIDS Cohort Study. , 1990, Statistics in medicine.

[5]  P. Rosenbaum Model-Based Direct Adjustment , 1987 .

[6]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[7]  D. Rubin,et al.  Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse , 1986 .

[8]  Raymond J. Carroll,et al.  Semiparametric Estimation in Logistic Measurement Error Models , 1989 .

[9]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[10]  Thomas R. Fleming,et al.  A Nonparametric Method for Dealing with Mismeasured Covariate Data , 1991 .

[11]  Raymond J. Carroll,et al.  DIMENSION REDUCTION IN A SEMIPARAMETRIC REGRESSION MODEL WITH ERRORS IN COVARIATES , 1995 .

[12]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[13]  Daniel F. Heitjan,et al.  Assessing Secular Trends in Blood Pressure: A Multiple-Imputation Approach , 1994 .

[14]  S. F. Buck A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer , 1960 .

[15]  M. Schluchter,et al.  Logistic regression with a partially observed covariate , 1989 .

[16]  Marcel G. Dagenais,et al.  The use of incomplete observations in multiple regression analysis: A generalized least squares approach , 1973 .

[17]  S Greenland,et al.  Analytic methods for two-stage case-control studies and other stratified designs. , 1991, Statistics in medicine.

[18]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[19]  E. Nadaraya On Non-Parametric Estimates of Density Functions and Regression Curves , 1965 .

[20]  Margaret S. Pepe,et al.  A mean score method for missing and auxiliary covariate data in regression models , 1995 .

[21]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[22]  R. Carroll,et al.  Prospective Analysis of Logistic Case-Control Studies , 1995 .

[23]  Raymond J. Carroll,et al.  On robust estimation in logistic case-control studies , 1993 .

[24]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[25]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Donald B. Rubin,et al.  Characterizing the effect of matching using linear propensity score methods with normal distributions , 1992 .

[28]  D. Pierce The Asymptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics , 1982 .

[29]  Xiao-Li Meng,et al.  The AIDS Epidemic: Estimating Survival After AIDS Diagnosis From Surveillance Data , 1993 .

[30]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[31]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[32]  Jianqing Fan Design-adaptive Nonparametric Regression , 1992 .

[33]  E. Nadaraya On Estimating Regression , 1964 .

[34]  Robert V. Foutz,et al.  On the Unique Consistent Solution to the Likelihood Equations , 1977 .

[35]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[36]  R. Little,et al.  Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[37]  R. Little Survey Nonresponse Adjustments for Estimates of Means , 1986 .

[38]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[39]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[40]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[41]  J. Copas Plotting p against x , 1983 .

[42]  W. Newey,et al.  Kernel Estimation of Partial Means and a General Variance Estimator , 1994, Econometric Theory.

[43]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[44]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .