Semiparametric Double Balancing Score Estimation for Incomplete Data With Ignorable Missingness

When estimating the marginal mean response with missing observations, a critical issue is robustness to model misspecification. In this article, we propose a semiparametric estimation method with extended double robustness that attains the optimal efficiency under less stringent requirement for model specifications than the doubly robust estimators. In this semiparametric estimation, covariate information is collapsed into a two-dimensional score S, with one dimension for (i) the pattern of missingness and the other for (ii) the pattern of response, both estimated from some working parametric models. The mean response E(Y) is then estimated by the sample mean of E(Y∣S), which is estimated via nonparametric regression. The semiparametric estimator is consistent if either the “core” of (i) or the “core” of (ii) is captured by S, and attains the optimal efficiency if both are captured by S. As the “cores” can be obtained without correctly specifying the full parametric models for (i) or (ii), the proposed estimator can be more robust than other doubly robust estimators. As S contains the propensity score as one component, the proposed estimator avoids the use and the shortcomings of inverse propensity weighting. This semiparametric estimator is most appealing for high-dimensional covariates, where fully correct model specification is challenging and nonparametric estimation is not feasible due to the problem of dimensionality. Numerical performance is investigated by simulation studies.

[1]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[2]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[3]  L. Devroye,et al.  Distribution-Free Consistency Results in Nonparametric Discrimination and Regression Function Estimation , 1980 .

[4]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[5]  W. Härdle Nonparametric and Semiparametric Models , 2004 .

[6]  H. Ichimura,et al.  SEMIPARAMETRIC LEAST SQUARES (SLS) AND WEIGHTED SLS ESTIMATION OF SINGLE-INDEX MODELS , 1993 .

[7]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[8]  B. Hansen The prognostic analogue of the propensity score , 2008 .

[9]  P. Ruud Sufficient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecifications of Distribution in Multinomial Discrete Choice Models , 1983 .

[10]  C. Manski Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator , 1985 .

[11]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[12]  R. John,et al.  Boundary modification for kernel regression , 1984 .

[13]  R. Little,et al.  Robust Likelihood-based Analysis of Multivariate Data with Missing Values , 2003 .

[14]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[15]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[16]  Jeffrey D. Hart,et al.  Kernel Regression When the Boundary Region is Large, with an Application to Testing the Adequacy of Polynomial Models , 1992 .

[17]  R. D'Agostino Adjustment Methods: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a Non‐Randomized Control Group , 2005 .

[18]  D. Brillinger A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .

[19]  J. Horowitz A Smoothed Maximum Score Estimator for the Binary Response Model , 1992 .

[20]  M. Wand,et al.  Multivariate Locally Weighted Least Squares Regression , 1994 .

[21]  J. Robins,et al.  Estimating exposure effects by modelling the expectation of exposure conditional on confounders. , 1992, Biometrics.

[22]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[23]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[24]  J. S. Mehta,et al.  A test for equality of means in the presence of correlation and missing values , 1973 .

[25]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[26]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[27]  E. Nadaraya On Estimating Regression , 1964 .

[28]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[29]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[30]  Philip E. Cheng,et al.  Nonparametric Estimation of Mean Functionals with Data Missing at Random , 1994 .