Efficient Quantile Regression Analysis With Missing Observations

This article examines the problem of estimation in a quantile regression model when observations are missing at random under independent and nonidentically distributed errors. We consider three approaches of handling this problem based on nonparametric inverse probability weighting, estimating equations projection, and a combination of both. An important distinguishing feature of our methods is their ability to handle missing response and/or partially missing covariates, whereas existing techniques can handle only one or the other, but not both. We prove that our methods yield asymptotically equivalent estimators that achieve the desirable asymptotic properties of unbiasedness, normality, and -consistency. Because we do not assume that the errors are identically distributed, our theoretical results are valid under heteroscedasticity, a particularly strong feature of our methods. Under the special case of identical error distributions, all of our proposed estimators achieve the semiparametric efficiency bound. To facilitate the practical implementation of these methods, we develop an iterative method based on the majorize/minimize algorithm for computing the quantile regression estimates, and a bootstrap method for computing their variances. Our simulation findings suggest that all three methods have good finite sample properties. We further illustrate these methods by a real data example. Supplementary materials for this article are available online.

[1]  C. Tang,et al.  An efficient empirical likelihood approach for estimating equations with missing data , 2012 .

[2]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[3]  I. James,et al.  Linear regression with censored data , 1979 .

[4]  D. J. Finney Probabilities Based on Circumstantial Evidence , 1977 .

[5]  Jae Kwang Kim,et al.  A Semiparametric Estimation of Mean Functionals With Nonignorable Missing Data , 2011 .

[6]  Jörg Drechsler,et al.  Multiple Imputation for Nonresponse , 2011 .

[7]  Lan Wang,et al.  Weighted quantile regression for analyzing health care cost data with missing covariates , 2013, Statistics in medicine.

[8]  Jianhua Z. Huang,et al.  Bootstrap consistency for general semiparametric $M$-estimation , 2009, 0906.1310.

[9]  Xiaohong Chen,et al.  Semiparametric efficiency in GMM models with auxiliary data , 2007, 0705.0069.

[10]  Alan T. K. Wan,et al.  Estimating Equations Inference With Missing Data , 2008 .

[11]  Raymond J Carroll,et al.  Multiple imputation in quantile regression. , 2012, Biometrika.

[12]  R. Koenker,et al.  Regression Quantiles , 2007 .

[13]  L. Zhao,et al.  Weighted Semiparametric Estimation in Regression Analysis with Missing Covariate Data , 1997 .

[14]  Liping Zhu,et al.  A Semiparametric Approach to Dimension Reduction , 2012, Journal of the American Statistical Association.

[15]  Raymond J. Carroll,et al.  A Semiparametric Correction for Attenuation , 1994 .

[16]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[17]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[18]  Sundarraman Subramanian,et al.  Median regression using nonparametric kernel estimation , 2002 .

[19]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[20]  Bradley Efron,et al.  Censored Data and the Bootstrap , 1981 .

[21]  B. Efron The two sample problem with censored data , 1967 .

[22]  Jungmo Yoon Quantile Regression Analysis with Missing Response, with Applications to Inequality Measures and Data Combination , 2010 .

[23]  G. Yin,et al.  Bayesian Quantile Regression for Longitudinal Studies with Nonignorable Missing Data , 2010, Biometrics.

[24]  Peter B Gilbert,et al.  Quantile Regression for Competing Risks Data with Missing Cause of Failure. , 2012, Statistica Sinica.

[25]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[26]  S. Lipsitz,et al.  Quantile Regression Methods for Longitudinal Data with Drop‐outs: Application to CD4 Cell Counts of Patients Infected with the Human Immunodeficiency Virus , 1997 .

[27]  Jianqing Fan,et al.  Efficient Estimation and Inferences for Varying-Coefficient Models , 2000 .

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Wenqing He,et al.  Median Regression Models for Longitudinal Data with Dropouts , 2009, Biometrics.

[30]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[31]  Geert Molenberghs,et al.  A semiparametric method of multiple imputation , 1998 .

[32]  Sundarraman Subramanian,et al.  Median regression and the missing information principle , 2001 .

[33]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[34]  Harvey Goldstein,et al.  A Study of Class Size Effects in English School Reception Year Classes , 2002 .

[35]  Geert Molenberghs,et al.  Local multiple imputation , 2002 .

[36]  D. Hunter,et al.  Quantile Regression via an MM Algorithm , 2000 .

[37]  Liugen Xue Empirical Likelihood Confidence Intervals for Response Mean with Data Missing at Random , 2009 .

[38]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[39]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[40]  Zhiliang Ying,et al.  A Missing Information Principle and $M$-Estimators in Regression Analysis with Censored and Truncated Data , 1994 .