Combining non‐probability and probability survey samples through mass imputation

This paper presents theoretical results on combining non-probability and probability survey samples through mass imputation, an approach originally proposed by Rivers (2007) as sample matching without rigorous theoretical justification. Under suitable regularity conditions, we establish the consistency of the mass imputation estimator and derive its asymptotic variance formula. Variance estimators are developed using either linearization or bootstrap. Finite sample performances of the mass imputation estimator are investigated through simulation studies and an application to analyzing a non-probability sample collected by the Pew Research Centre.

[1]  Chris J. Skinner,et al.  Imputation under Informative Sampling , 2016 .

[2]  EMPIRICAL LIKELIHOOD METHODS FOR COMPLEX SURVEYS WITH DATA MISSING-BY-DESIGN , 2018 .

[3]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[4]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[5]  Roger Tourangeau,et al.  Summary Report of the AAPOR Task Force on Non-probability Sampling , 2013 .

[6]  Jae Kwang Kim,et al.  Mass imputation for two-phase sampling , 2019 .

[7]  Roger Tourangeau,et al.  The Science of Web Surveys , 2013 .

[8]  Michael R. Elliott,et al.  Inference for Nonprobability Samples , 2017 .

[9]  Jae Kwang Kim,et al.  Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework , 2019, Scandinavian journal of statistics, theory and applications.

[10]  C. F. Wu,et al.  Resampling Inference with Complex Survey Data , 1988 .

[11]  Jae Kwang Kim Parametric fractional imputation for missing data analysis , 2011 .

[12]  Li‐Chun Zhang,et al.  Minimal inference from incomplete 2 × 2-tables , 2018, Analysis of Integrated Data.

[13]  Jelke Bethlehem,et al.  Solving the Nonresponse Problem With Sample Matching? , 2016 .

[14]  Richard Valliant,et al.  Estimating Propensity Adjustments for Volunteer Web Surveys , 2011 .

[15]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[16]  J. N. K. Rao,et al.  Combining data from two independent surveys: a model-assisted approach , 2012 .

[17]  M. Thompson,et al.  Sampling Theory and Practice , 2020, ICSA Book Series in Statistics.

[18]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[19]  Sharon L. Lohr,et al.  Combining Survey Data with Other Data Sources , 2017 .

[20]  Pengfei Li,et al.  Doubly Robust Inference With Nonprobability Survey Samples , 2018, Journal of the American Statistical Association.

[21]  Douglas Rivers,et al.  The 2006 Cooperative Congressional Election Study , 2008 .

[22]  Jae Kwang Kim,et al.  Predictive mean matching imputation in survey sampling , 2017, 1703.10256.

[23]  Ronald H. Randles,et al.  On the Asymptotic Normality of Statistics with Estimated Parameters , 1982 .

[24]  Thomas A. Louis,et al.  Perils and potentials of self‐selected entry to epidemiological studies and surveys , 2016 .

[25]  James O. Chipperfield,et al.  COMBINING HOUSEHOLD SURVEYS USING MASS IMPUTATION TO ESTIMATE POPULATION TOTALS , 2012 .

[26]  R. Valliant,et al.  General Regression Estimation Adjusted for Undercoverage and Estimated Control Totals , 2016 .

[27]  Wayne A. Fuller,et al.  TWO-PHASE ESTIMATION BY IMPUTATION , 2002 .

[28]  Douglas Rivers,et al.  Sampling for Web Surveys , 2007, Handbook of Web Surveys.

[29]  J. Michael Brick Compositional Model Inference , 2015 .

[30]  Sunghee Lee,et al.  Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment , 2009 .