A general framework for quantile estimation with incomplete data

Quantile estimation has attracted significant research interests in recent years. However, there has been only a limited literature on quantile estimation in the presence of incomplete data. In this paper, we propose a general framework to address this problem. Our framework combines the two widely adopted approaches for missing data analysis, the imputation approach and the inverse probability weighting approach, via the empirical likelihood method. The proposed method is capable of dealing with many different missingness settings. We mainly study three of them: (i) estimating the marginal quantile of a response that is subject to missingness while there are fully observed covariates; (ii) estimating the conditional quantile of a fully observed response while the covariates are partially available; and (iii) estimating the conditional quantile of a response that is subject to missingness with fully observed covariates and extra auxiliary variables. The proposed method allows multiple models for both the missingness probability and the data distribution. The resulting estimators are multiply robust in the sense that they are consistent if any one of these models is correctly specified. The asymptotic distributions are established using the empirical process theory.

[1]  Victor Chernozhukov,et al.  Quantile regression , 2019, Journal of Econometrics.

[2]  Peisong Han,et al.  Intrinsic efficiency and multiple robustness in longitudinal studies with drop-out , 2016 .

[3]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[4]  Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys , 2002 .

[5]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[6]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[7]  Zhiqiang Tan,et al.  Bounded, efficient and doubly robust estimation with inverse weighting , 2010 .

[8]  Dong Wang,et al.  EMPIRICAL LIKELIHOOD FOR ESTIMATING EQUATIONS WITH MISSING VALUES , 2009, 0903.0726.

[9]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[10]  Kwun Chuen Gary Chan,et al.  Oracle, Multiple Robust and Multipurpose Calibration in a Missing Response Problem , 2014, 1410.3958.

[11]  Q. Shao,et al.  A general bahadur representation of M-estimators and its application to linear regression with nonstochastic designs , 1996 .

[12]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[13]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[14]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[15]  Cindy L. Yu,et al.  Parameter estimation through semiparametric quantile regression imputation , 2016 .

[16]  A Further Study of Propensity Score Calibration in Missing Data Analysis , 2018 .

[17]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[18]  Jae Kwang Kim,et al.  Statistical Methods for Handling Incomplete Data , 2013 .

[19]  Changbao Wu,et al.  A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data , 2001 .

[20]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[21]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[22]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. , 2005, Statistical science : a review journal of the Institute of Mathematical Statistics.

[23]  Jing Qin,et al.  Improving semiparametric estimation by using surrogate data , 2008 .

[24]  Biao Zhang,et al.  Empirical Likelihood in Missing Data Problems , 2009 .

[25]  R. Koenker Quantile Regression: Name Index , 2005 .

[26]  J. N. K. Rao,et al.  Empirical likelihood-based inference under imputation for missing response data , 2002 .

[27]  Jungmo Yoon Quantile Regression Analysis with Missing Response, with Applications to Inequality Measures and Data Combination , 2010 .

[28]  G. Edwards On trial , 1976, Nature.

[29]  Wenqing He,et al.  Median Regression Models for Longitudinal Data with Dropouts , 2009, Biometrics.

[30]  Peter B Gilbert,et al.  Quantile Regression for Competing Risks Data with Missing Cause of Failure. , 2012, Statistica Sinica.

[31]  Peisong Han,et al.  A further study of the multiply robust estimator in missing data analysis , 2014 .

[32]  Shu Yang,et al.  Imputation methods for quantile estimation under missing at random , 2013 .

[33]  M. Davidian,et al.  Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data , 2009, Biometrika.

[34]  Yong Zhou,et al.  Efficient Quantile Regression Analysis With Missing Observations , 2015 .

[35]  Sixia Chen,et al.  Multiply robust imputation procedures for the treatment of item nonresponse in surveys , 2017 .

[36]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[37]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[38]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[39]  Peisong Han,et al.  Combining Inverse Probability Weighting and Multiple Imputation to Improve Robustness of Estimation , 2016 .

[40]  J. Robins,et al.  Comment: Performance of Double-Robust Estimators When “Inverse Probability” Weights Are Highly Variable , 2007, 0804.2965.

[41]  Xiaotong Shen,et al.  Empirical Likelihood , 2002 .

[42]  Niansheng Tang,et al.  Robust estimation of distribution functions and quantiles with non‐ignorable missing data , 2013 .

[43]  S. Hammer,et al.  A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. AIDS Clinical Trials Group Study 175 Study Team. , 1996, The New England journal of medicine.

[44]  Wu Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys , 2002 .

[45]  R. Wilke,et al.  Quantile Regression Methods , 2015 .

[46]  Biao Zhang,et al.  Efficient and Doubly Robust Imputation for Covariate-Dependent Missing Responses , 2008 .

[47]  Lu Wang,et al.  Estimation with missing data: beyond double robustness , 2013 .

[48]  Marie Davidian,et al.  Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates , 2008, Biometrics.

[49]  Marie Davidian,et al.  Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. , 2008, Statistical science : a review journal of the Institute of Mathematical Statistics.

[50]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[51]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[52]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[53]  Peisong Han,et al.  Multiply Robust Estimation in Regression Analysis With Missing Data , 2014 .

[54]  Biao Zhang,et al.  Empirical‐likelihood‐based inference in missing response problems and its application in observational studies , 2007 .

[55]  S. Lipsitz,et al.  Quantile Regression Methods for Longitudinal Data with Drop‐outs: Application to CD4 Cell Counts of Patients Infected with the Human Immunodeficiency Virus , 1997 .

[56]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[57]  Ying Wei,et al.  QUANTILE REGRESSION WITH COVARIATES MISSING AT RANDOM , 2014 .

[58]  Lan Wang,et al.  Weighted quantile regression for analyzing health care cost data with missing covariates , 2013, Statistics in medicine.

[59]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[60]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest‐Posttest Study , 2003, Biometrics.

[61]  Raymond J Carroll,et al.  Multiple imputation in quantile regression. , 2012, Biometrika.

[62]  W. Newey,et al.  Semiparametric Efficiency Bounds , 1990 .