Empirical‐likelihood‐based inference in missing response problems and its application in observational studies

Summary. The problem of missing response data is ubiquitous in medical and social science studies. In the case of responses that are missing at random (depending on some covariate information), analyses focused only on the complete data may lead to biased results. Various debias methods have been extensively studied in the literature, particularly the weighting method that was motivated by Horvitz and Thompson's estimators. To improve efficiency, Robins, Rotnitzky and Zhao proposed augmented estimating equations based on corrected complete‐case analyses. A nice feature of the augmented method is its ‘double robustness’, i.e. the estimator that is derived from the augmented method is asymptotically unbiased if either the underlying missing data mechanism or the underlying regression function is correctly specified. Furthermore, the augmented estimator can achieve full efficiency if both the missing data mechanism and the regression function are correctly specified. In general, however, it is very difficult to specify the regression function correctly, especially when the dimension of covariates is high— this is the so‐called curse of dimensionality problem. The augmented estimator has much lower efficiency if the ‘working regression model’ is not close to the true regression model. In this paper, the empirical likelihood method is employed to seek a constrained empirical likelihood estimation of mean response with the assumption that responses are missing at random. The empirical‐likelihood‐based estimators enjoy the double‐robustness property. Moreover, it is possible that the empirical‐likelihood‐based inference can produce asymptotically unbiased and efficient estimators even if the true regression function is not completely known. Simulation results indicate that the empirical‐likelihood‐based estimators are very robust to a misspecification of the propensity score and dominate other competitors in the sense of having smaller mean‐square errors. Methods that are developed in this paper have a nice application in observational causal inferences. The propensity score is used to adjust for differences in pretreatment variables in the estimation of average treatment effects.

[1]  Y. Vardi,et al.  Nonparametric Estimation in the Presence of Length Bias , 1982 .

[2]  Shelby J. Haberman,et al.  Adjustment by Minimum Discriminant Information , 1984 .

[3]  James J. Heckman,et al.  Characterizing Selection Bias Using Experimental Data , 1998 .

[4]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[5]  Feiming Chen,et al.  Empirical likelihood inference for censored median regression model via nonparametric kernel estimation , 2008 .

[6]  Art B. Owen,et al.  Empirical Likelihood for Linear Models , 1991 .

[7]  J. N. K. Rao,et al.  Empirical likelihood-based inference under imputation for missing response data , 2002 .

[8]  Jing Qin,et al.  Empirical Likelihood in Biased Sample Problems , 1993 .

[9]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[10]  Yuichi Kitamura,et al.  Empirical likelihood methods with weakly dependent processes , 1997 .

[11]  G. Imbens Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review , 2004 .

[12]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[13]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[14]  G. Imbens,et al.  Combining Micro and Macro Data in Microeconometric Models , 1994 .

[15]  E. Korn,et al.  Clinician Preferences and the Estimation of Causal Treatment Differences , 1998 .

[16]  Peter B. Gilbert Large sample theory of maximum likelihood estimates in semiparametric biased sampling models , 2000 .

[17]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[18]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[19]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[20]  Subhash R. Lele,et al.  Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials , 1999 .

[21]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[22]  Richard D. Gill,et al.  Large sample theory of empirical distributions in biased sampling models , 1988 .

[23]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[24]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[25]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[26]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[27]  A. Owen Empirical Likelihood Ratio Confidence Regions , 1990 .

[28]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[29]  N. Breslow Are Statistical Contributions to Medicine Undervalued? , 2003, Biometrics.

[30]  Jiahua Chen,et al.  Empirical likelihood estimation for ?nite populations and the e?ective usage of auxiliary informatio , 1993 .

[31]  Y. Vardi Empirical Distributions in Selection Bias Models , 1985 .

[32]  Yuichi Kitamura,et al.  Testing conditional moment restrictions , 2003 .

[33]  Zhiqiang Tan,et al.  A Distributional Approach for Causal Inference Using Propensity Scores , 2006 .

[34]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .