Robust Likelihood-based Analysis of Multivariate Data with Missing Values

The model-based approach to inference from multivariate data with missing values is reviewed. Regression prediction is most useful when the covariates are predictive of the missing values and the probability of being missing, and in these circumstances predictions are particularly sensitive to model misspecification. The use of penalized splines of the propensity score is proposed to yield robust model-based inference under the missing at random (MAR) assumption, assuming monotone missing data. Simulation comparisons with other methods suggest that the method works well in a wide range of populations, with little loss of efficiency relative to parametric models when the latter are correct. Extensions to more general patterns are outlined.

[1]  Peter J. Bickel,et al.  INFERENCE FOR SEMIPARAMETRIC MODELS: SOME QUESTIONS AND AN ANSWER , 2001 .

[2]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[3]  D. Ruppert,et al.  Penalized Spline Estimation for Partially Linear Single-Index Models , 2002 .

[4]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[5]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[6]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[7]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[8]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[9]  Philip E. Cheng,et al.  Nonparametric Estimation of Mean Functionals with Data Missing at Random , 1994 .

[10]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[11]  C. Klaassen,et al.  Discussion to "Inference for semiparametric models: some questions and an answer" by Peter J. Bickel and Jaimyoung Kwon , 2001 .

[12]  J. Horowitz,et al.  Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data , 2000 .

[13]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[14]  David Ruppert,et al.  Theory & Methods: Spatially‐adaptive Penalties for Spline Fitting , 2000 .

[15]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[16]  Paul Allison Nonignorable Missing Data , 2002 .

[17]  R. Irizarry,et al.  Generalized Additive Selection Models for the Analysis of Studies with Potentially Nonignorable Missing Outcome Data , 2003, Biometrics.

[18]  D. O. Scharfstein Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion) , 1999 .

[19]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[20]  R. Little,et al.  Pattern-mixture models for multivariate incomplete data with covariates. , 1996, Biometrics.

[21]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[22]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .