Integrating Probability and Nonprobability Samples for Survey Inference

Survey data collection costs have risen to a point where many survey researchers and polling companies are abandoning large, expensive probability-based samples in favor of less expensive nonprobability samples. The empirical literature suggests this strategy may be suboptimal for multiple reasons, among them that probability samples tend to outperform nonprobability samples on accuracy when assessed against population benchmarks. However, nonprobability samples are often preferred due to convenience and costs. Instead of forgoing probability sampling entirely, we propose a method of combining both probability and nonprobability samples in a way that exploits their strengths to overcome their weaknesses within a Bayesian inferential framework. By using simulated data, we evaluate supplementing inferences based on small probability samples with prior distributions derived from nonprobability data. We demonstrate that informative priors based on nonprobability data can lead to reductions in variances and mean squared errors for linear model coefficients. The method is also illustrated with actual probability and nonprobability survey data. A discussion of these findings, their implications for survey practice, and possible research extensions are provided in conclusion.

[1]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[2]  A. Blom,et al.  Supplementing Small Probability Samples with Nonprobability Samples: A Bayesian Approach , 2019, Journal of Official Statistics.

[3]  A. Zellner An Introduction to Bayesian Inference in Econometrics , 1971 .

[4]  Danna L. Moore,et al.  Characteristics of Cell Phone Only, Listed, and Unlisted Telephone Households , 2009 .

[5]  Michael R. Elliott,et al.  Combining Data from Probability and Non- Probability Samples Using Pseudo-Weights , 2009 .

[6]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[7]  Andrae I. Khuri,et al.  Advanced Calculus with Applications in Statistics , 2003 .

[8]  C. Skinner,et al.  SAMPLE MODELS AND WEIGHTS , 2002 .

[9]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[10]  Mario Callegaro,et al.  Online Panel Research: A Data Quality Perspective , 2014 .

[11]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[12]  Josh Pasek,et al.  When will Nonprobability Surveys Mirror Probability Surveys? Considering Types of Inference and Weighting Strategies as Criteria for Correspondence , 2016 .

[13]  Stephen Ansolabehere,et al.  Cooperative Survey Research , 2013 .

[14]  Sunghee Lee,et al.  Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment , 2009 .

[15]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[16]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[17]  Joseph W. Sakshaug,et al.  Does the Recruitment of Offline Households Increase the Sample Representativeness of Probability-Based Online Panels? Evidence From the German Internet Panel , 2017 .

[18]  Douglas Rivers,et al.  Sampling for Web Surveys , 2007, Handbook of Web Surveys.

[19]  D. Yeager,et al.  Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples , 2011 .

[20]  J. Krosnick,et al.  National Surveys Via Rdd Telephone Interviewing Versus the Internet Comparing Sample Representativeness and Response Quality , 2009 .

[21]  Richard Valliant,et al.  Estimating Propensity Adjustments for Volunteer Web Surveys , 2011 .

[22]  B. D. Finetti,et al.  Bayesian inference and decision techniques : essays in honor of Bruno de Finetti , 1986 .

[23]  C. Robert,et al.  Bayesian Modeling Using WinBUGS , 2009 .

[24]  Jon A. Krosnick,et al.  The Effect of Survey Mode and Sampling on Inferences about Political Attitudes and Behavior: Comparing the 2000 and 2004 ANES to Internet Surveys with Nonprobability Samples , 2007, Political Analysis.

[25]  Michael R. Elliott,et al.  Inference for Nonprobability Samples , 2017 .

[26]  A. Blom,et al.  Setting Up an Online Panel Representative of the General Population , 2015 .

[27]  Jon A. Krosnick,et al.  The Accuracy of Measurements with Probability and Nonprobability Survey Samples: Replication and Extension , 2018 .

[28]  Douglas Rivers,et al.  Inference From Matched Samples in the 2008 U.S. National Elections , 2009 .

[29]  Danny Pfeffermann,et al.  Small Area Estimation , 2011, International Encyclopedia of Statistical Science.

[30]  Kenneth S. Berenhaut,et al.  Advanced Calculus with Applications in Statistics , 2004 .

[31]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[32]  Trent D. Buskirk,et al.  Apples to Oranges or Gala versus Golden Delicious?Comparing Data Quality of Nonprobability Internet Samples to Low Response Rate Probability Samples , 2017 .

[33]  Sunghee Lee Propensity score adjustment as a weighting scheme for volunteer panel web surveys , 2006 .

[34]  Roger Tourangeau,et al.  Summary Report of the AAPOR Task Force on Non-probability Sampling , 2013 .

[35]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[36]  Robert E Weiss,et al.  Bayesian methods for data analysis. , 2010, American journal of ophthalmology.

[37]  M. Elliott,et al.  Use of a web­based convenience sample to supplement a probability sample , 2007 .