Copula selection models for non‐Gaussian outcomes that are missing not at random

Missing not at random (MNAR) data pose key challenges for statistical inference because the substantive model of interest is typically not identifiable without imposing further (eg, distributional) assumptions. Selection models have been routinely used for handling MNAR by jointly modeling the outcome and selection variables and typically assuming that these follow a bivariate normal distribution. Recent studies have advocated parametric selection approaches, for example, estimated by multiple imputation and maximum likelihood, that are more robust to departures from the normality assumption compared with those assuming that nonresponse and outcome are jointly normally distributed. However, the proposed methods have been mostly restricted to a specific joint distribution (eg, bivariate t-distribution). This paper discusses a flexible copula-based selection approach (which accommodates a wide range of non-Gaussian outcome distributions and offers great flexibility in the choice of functional form specifications for both the outcome and selection equations) and proposes a flexible imputation procedure that generates plausible imputed values from the copula selection model. A simulation study characterizes the relative performance of the copula model compared with the most commonly used selection models for estimating average treatment effects with MNAR data. We illustrate the methods in the REFLUX study, which evaluates the effect of laparoscopic surgery on long-term quality of life in patients with reflux disease. We provide software code for implementing the proposed copula framework using the R package GJRM.

[1]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[2]  R. Faria,et al.  Clinical and economic evaluation of laparoscopic surgery compared with medical management for gastro-oesophageal reflux disease: 5-year follow-up of multicentre randomised trial (the REFLUX trial). , 2013, Health technology assessment.

[3]  Whitney K. Newey,et al.  Nonparametric Estimation of Sample Selection Models , 2003 .

[4]  Yahong Zhou,et al.  Semiparametric and nonparametric estimation of sample selection models under symmetry , 2010 .

[5]  S. Wood,et al.  Coverage Properties of Confidence Intervals for Generalized Additive Model Components , 2012 .

[6]  Rosalba Radice,et al.  Bivariate copula additive models for location, scale and shape , 2016, Comput. Stat. Data Anal..

[7]  James R Carpenter,et al.  Joint modelling rationale for chained equations , 2014, BMC Medical Research Methodology.

[8]  Sylvie Chevret,et al.  A multiple imputation approach for MNAR mechanisms compatible with Heckman's model , 2016, Statistics in medicine.

[9]  Claudia Pigini Bivariate Non-Normality in the Sample Selection Model , 2014 .

[10]  J. Heckman Shadow prices, market wages, and labor supply , 1974 .

[11]  Manuel Gomes,et al.  A Guide to Handling Missing Data in Cost-Effectiveness Analysis Conducted Within Randomised Controlled Trials , 2014, PharmacoEconomics.

[12]  Katherine J. Lee,et al.  Multiple imputation in the presence of non‐normal data , 2017, Statistics in medicine.

[13]  Nadja Klein,et al.  Bayesian structured additive distributional regression for multivariate responses , 2015 .

[14]  Yulia V. Marchenko,et al.  A Heckman Selection-t Model , 2012 .

[15]  A. Gelman,et al.  ON THE STATIONARY DISTRIBUTION OF ITERATIVE IMPUTATIONS , 2010, 1012.2902.

[16]  Handling dropout and clustering in longitudinal multicentre clinical trials , 2006 .

[17]  Richard Grieve,et al.  Development of a practical approach to expert elicitation for randomised controlled trials with missing health outcomes: Application to the IMPROVE trial , 2017, Clinical trials.

[18]  Patrick A. Puhani,et al.  The Heckman Correction for Sample Selection and Its Critique - A Short Survey , 2000 .

[19]  Siddhartha Chib,et al.  Estimation of Semiparametric Models in the Presence of Endogeneity and Sample Selection , 2009 .

[20]  G. Marra,et al.  Penalized likelihood estimation of a trivariate additive probit model , 2017, Biostatistics.

[21]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[22]  Whitney K. Newey,et al.  Two-Step Series Estimation of Sample Selection Models , 2009 .

[23]  Michael G. Kenward,et al.  Multiple Imputation and its Application: Carpenter/Multiple Imputation and its Application , 2013 .

[24]  J. Heckman Sample selection bias as a specification error , 1979 .

[25]  M. Genton,et al.  Robust inference in sample selection models , 2016 .

[26]  J. Spertus,et al.  Assessing response bias from missing quality of life data: The Heckman method , 2004, Health and quality of life outcomes.

[27]  Arne Henningsen,et al.  Sample Selection Models in R: Package sampleSelection , 2008 .

[28]  A. Gray,et al.  The effect of diabetes complications on health-related quality of life: the importance of longitudinal data to address patient heterogeneity. , 2014, Health economics.

[29]  Peng Ding,et al.  Bayesian robust inference of sample selection using selection-t models , 2014, J. Multivar. Anal..

[30]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .