Identification and estimation of nonignorable missing outcome mean without identifying the full data distribution

We consider the problem of making inference about the population outcome mean of an outcome variable subject to nonignorable missingness. By leveraging a so-called shadow variable for the outcome, we propose a novel condition that ensures nonparametric identification of the outcome mean, although the full data distribution is not identified. The identifying condition requires the existence of a function as a solution to a representer equation that connects the shadow variable to the outcome mean. Under this condition, we use sieves to nonparametrically solve the representer equation and propose an estimator which avoids modeling the propensity score or the outcome regression. We establish the asymptotic properties of the proposed estimator. We also show that the estimator is locally efficient and attains the semiparametric efficiency bound for the shadow variable model under certain regularity conditions. We illustrate the proposed approach via simulations and a real data application on home pricing.

[1]  J. Robins,et al.  IDENTIFICATION AND INFERENCE FOR MARGINAL AVERAGE TREATMENT EFFECT ON THE TREATED WITH AN INSTRUMENTAL VARIABLE. , 2015, Statistica Sinica.

[2]  Thomas A. Severini,et al.  Efficiency Bounds for Estimating Linear Functionals of Nonparametric Regression Models with Endogenous Regressors , 2007 .

[3]  W. Newey,et al.  Instrumental variable estimation of nonparametric models , 2003 .

[4]  G. Zahner,et al.  Children's mental health service needs and utilization patterns in an urban community: an epidemiological assessment. , 1992, Journal of the American Academy of Child and Adolescent Psychiatry.

[5]  Zhi Geng,et al.  Identifiability of Normal and Normal Mixture Models with Nonignorable Missing Data , 2015, 1509.03860.

[6]  Xiaohong Chen,et al.  Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions , 2003 .

[7]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[8]  J. Florens,et al.  Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization , 2003 .

[9]  Whitney K. Newey,et al.  Nonparametric Estimation of Sample Selection Models , 2003 .

[10]  Andrés Santos Instrumental variable methods for recovering continuous linear functionals , 2011 .

[11]  Xiaohong Chen Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models , 2007 .

[12]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[13]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .

[14]  Eric J Tchetgen Tchetgen,et al.  A general instrumental variable framework for regression analysis with outcome missing not at random , 2017, Biometrics.

[15]  Wang Miao,et al.  On varieties of doubly robust estimators under missingness not at random with a shadow variable , 2015, Biometrika.

[16]  J. Heckman Sample selection bias as a specification error , 1979 .

[17]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[18]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[19]  Jun Shao,et al.  Estimation With Survey Data Under Nonignorable Nonresponse or Informative Sampling , 2002 .

[20]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[21]  William S. Reece,et al.  Imputation of Missing Values When the Probability of Response Depends on the Variable Being Imputed , 1982 .

[22]  Roderick J. A. Little,et al.  Analysis of multivariate missing data with nonignorable nonresponse , 2003 .

[23]  J. Florens,et al.  Nonparametric Instrumental Regression , 2010 .

[24]  Jae Kwang Kim,et al.  An Instrumental Variable Approach for Identification and Estimation with Nonignorable Nonresponse , 2014 .

[25]  Jun Shao,et al.  Semiparametric Pseudo-Likelihoods in Generalized Linear Models With Nonignorable Missing Data , 2015 .

[26]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[27]  Yifan Cui,et al.  Semiparametric proximal causal inference , 2020 .

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Xiaohong Chen,et al.  Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Generalized Residuals , 2009 .

[30]  J. Robins,et al.  Analysis of semi-parametric regression models with non-ignorable non-response. , 1997, Statistics in medicine.

[31]  W. Newey,et al.  Convergence rates and asymptotic normality for series estimators , 1997 .

[32]  Xavier D'Haultfoeuille,et al.  A New Instrumental Method for Dealing with Endogenous Selection , 2010 .

[33]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[34]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[35]  Annie Qu,et al.  Semiparametric Estimating Equations Inference with Nonignorable Missing Data , 2017 .

[36]  Joseph G. Ibrahim,et al.  Using auxiliary data for parameter estimation with non‐ignorably missing outcomes , 2001 .

[37]  Lei Wang,et al.  Semiparametric inverse propensity weighting for nonignorable missing data , 2016 .

[38]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[39]  J. Horowitz Semiparametric and Nonparametric Methods in Econometrics , 2007 .

[40]  Jianhua Z. Huang Local asymptotics for polynomial spline regression , 2003 .

[41]  Yanyuan Ma,et al.  A Versatile Estimation Procedure Without Estimating the Nonignorable Missingness Mechanism , 2019, Journal of the American Statistical Association.

[42]  Puying Zhao,et al.  Empirical Likelihood for Estimating Equations with Nonignorably Missing Data. , 2014, Statistica Sinica.

[43]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[44]  Yanyuan Ma,et al.  Optimal pseudolikelihood estimation in the analysis of multivariate missing data with nonignorable nonresponse. , 2018, Biometrika.

[45]  Eric J. Tchetgen Tchetgen,et al.  Identification, Doubly Robust Estimation, and Semiparametric Efficiency Theory of Nonignorable Missing Data With a Shadow Variable , 2015 .

[46]  Eric J Tchetgen Tchetgen,et al.  Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable. , 2016, Statistica Sinica.

[47]  Jae Kwang Kim,et al.  A Semiparametric Estimation of Mean Functionals With Nonignorable Missing Data , 2011 .

[48]  Z. Geng,et al.  Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder. , 2016, Biometrika.

[49]  P. Kott Calibration Weighting When Model and Calibration Variables Can Differ , 2014 .

[50]  V. Chernozhukov,et al.  Estimation and Confidence Regions for Parameter Sets in Econometric Models , 2007 .