Nonrespondent Subsample Multiple Imputation in Two-Phase Sampling for Nonresponse

Abstract Nonresponse is very common in epidemiologic surveys and clinical trials. Common methods for dealing with missing data (e.g., complete-case analysis, ignorable-likelihood methods, and nonignorable modeling methods) rely on untestable assumptions. Nonresponse two-phase sampling (NTS), which takes a random sample of initial nonrespondents for follow-up data collection, provides a means to reduce nonresponse bias. However, traditional weighting methods to analyze data from NTS do not make full use of auxiliary variables. This article proposes a method called nonrespondent subsample multiple imputation (NSMI), where multiple imputation (Rubin 1987) is performed within the subsample of nonrespondents in Phase I using additional data collected in Phase II. The properties of the proposed methods by simulation are illustrated and the methods applied to a quality of life study. The simulation study shows that the gains from using the NTS scheme can be substantial, even if NTS sampling only collects data from a small proportion of the initial nonrespondents.

[1]  P. Cohen,et al.  The children in the community study of developmental course of personality disorder. , 2005, Journal of personality disorders.

[2]  D. Rubin,et al.  Ignorability and Coarse Data , 1991 .

[3]  Roderick J. A. Little,et al.  A Class of Pattern-Mixture Models for Normal Incomplete Data , 1994 .

[4]  Roderick J. A. Little,et al.  Subsample ignorable likelihood for regression analysis with missing data , 2011 .

[5]  K. Srinath Multiphase Sampling in Nonresponse Problems , 1971 .

[6]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[7]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[8]  Balgobin Nandram,et al.  Hierarchical Bayesian Nonresponse Models for Binary Data From Small Areas With Uncertainty About Ignorability , 2002 .

[9]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[10]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[11]  Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches , 2009, Health and quality of life outcomes.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[14]  T. Amemiya Tobit models: A survey , 1984 .

[15]  M. Bonetti,et al.  A Method-of-Moments Estimation Procedure for Categorical Quality-of-Life Data with Nonignorable Missingness , 1999 .

[16]  Balgobin Nandram,et al.  A Bayesian Analysis of Body Mass Index Data From Small Domains Under Nonignorable Nonresponse and Selection , 2010 .

[17]  Roderick J. A. Little,et al.  Subsampling Callbacks to Improve Survey Efficiency , 2000 .

[18]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[19]  P. Cohen,et al.  Construction and validation of a quality of life instrument for young adults , 2004, Quality of Life Research.

[20]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[21]  E. Ziegel,et al.  Nonresponse In Household Interview Surveys , 1998 .

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[24]  M. H. Hansen,et al.  The problem of non-response in sample surveys. , 1946, Journal of the American Statistical Association.