A robust imputation method for missing responses and covariates in sample selection models

Sample selection arises when the outcome of interest is partially observed in a study. Although sophisticated statistical methods in the parametric and non-parametric framework have been proposed to solve this problem, it is yet unclear how to deal with selectively missing covariate data using simple multiple imputation techniques, especially in the absence of exclusion restrictions and deviation from normality. Motivated by the 2003–2004 NHANES data, where previous authors have studied the effect of socio-economic status on blood pressure with missing data on income variable, we proposed the use of a robust imputation technique based on the selection-t sample selection model. The imputation method, which is developed within the frequentist framework, is compared with competing alternatives in a simulation study. The results indicate that the robust alternative is not susceptible to the absence of exclusion restrictions – a property inherited from the parent selection-t model – and performs better than models based on the normal assumption even when the data is generated from the normal distribution. Applications to missing outcome and covariate data further corroborate the robustness properties of the proposed method. We implemented the proposed approach within the MICE environment in R Statistical Software.

[1]  Lung-fei Lee Generalized Econometric Models with Selectivity , 1983 .

[2]  Whitney K. Newey,et al.  Nonparametric Estimation of Sample Selection Models , 2003 .

[3]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[4]  Roderick J. A. Little,et al.  Subsample ignorable likelihood for regression analysis with missing data , 2011 .

[5]  Yulia V. Marchenko,et al.  A Heckman Selection-t Model , 2012 .

[6]  J. Hutton,et al.  A unified approach to multilevel sample selection models , 2016 .

[7]  J. Copas,et al.  Inference for Non‐random Samples , 1997 .

[8]  Shihti Yu,et al.  Collinearity and Two-Step Estimation of Sample Selection Models:Problems, Origins, and Remedies , 2000 .

[9]  J. Hutton,et al.  A Sample Selection Model with Skew‐normal Distribution , 2016 .

[10]  Murray D Smith,et al.  Modeling Sample Selection Using Archimedean Copulas , 2003 .

[11]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[12]  S. Pocock,et al.  Coping with missing data in clinical trials: A model‐based approach applied to asthma trials , 2002, Statistics in medicine.

[13]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[14]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[15]  Sylvie Chevret,et al.  A multiple imputation approach for MNAR mechanisms compatible with Heckman's model , 2016, Statistics in medicine.

[16]  F. Vella Estimating Models with Sample Selection Bias: A Survey , 1998 .

[17]  J. Powell,et al.  Semiparametric estimation of censored selection models with a nonparametric selection mechanism , 1993 .

[18]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[19]  R Little,et al.  Intent-to-treat analysis for longitudinal studies with drop-outs. , 1996, Biometrics.

[20]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[21]  Mouna Akacha,et al.  Sensitivity analyses for partially observed recurrent event data , 2016, Pharmaceutical statistics.

[22]  M. Genton,et al.  Robust inference in sample selection models , 2016 .

[23]  Pravin K. Trivedi,et al.  Microeconometrics Using Stata , 2009 .

[24]  S. Walker,et al.  Objective prior for the number of degrees of freedom of a t distribution , 2014 .

[25]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[26]  J. Heckman Sample selection bias as a specification error , 1979 .