Computationally efficient Bayesian unit-level models for non-Gaussian data under informative sampling with application to estimation of health insurance coverage

Statistical estimates from survey samples have traditionally been obtained via design-based estimators. In many cases, these estimators tend to work well for quantities such as population totals or means, but can fall short as sample sizes become small. In today's "information age," there is a strong demand for more granular estimates. To meet this demand, using a Bayesian pseudo-likelihood, we propose a computationally efficient unit-level modeling approach for non-Gaussian data collected under informative sampling designs. Specifically, we focus on binary and multinomial data. Our approach is both multivariate and multiscale, incorporating spatial dependence at the area-level. We illustrate our approach through an empirical simulation study and through a motivating application to health insurance estimates using the American Community Survey.

[1]  D. Binder On the variances of asymptotically normal estimators from complex surveys , 1983 .

[2]  Scott W. Linderman,et al.  Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation , 2015, NIPS.

[3]  C. J. Skinner,et al.  Domain means, regression and multi-variate analysis , 1989 .

[4]  Muhammad Hanif,et al.  POISSON, MODIFIED POISSON AND COLLOCATED SAMPLING , 1984 .

[5]  Kurt J Greenlund,et al.  Multilevel regression and poststratification for small-area estimation of population health outcomes: a case study of chronic obstructive pulmonary disease prevalence using the behavioral risk factor surveillance system. , 2014, American journal of epidemiology.

[6]  Paul A. Parker,et al.  Conjugate Bayesian unit‐level modelling of count data under informative sampling designs , 2019, Stat.

[7]  Daniele Durante,et al.  Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models , 2017, Statistical Science.

[8]  D. Pfeffermann,et al.  Small-Area Estimation Under Informative Probability Sampling of Areas and Within the Selected Areas , 2007 .

[9]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[10]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[11]  Murali Haran,et al.  Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , 2010, 1011.6649.

[12]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[13]  Andrew Gelman,et al.  State-Level Opinions from National Surveys: Poststratification using Multilevel Logistic Regression , 2009 .

[14]  Rachel M. Harter,et al.  An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data , 1988 .

[15]  C. Wikle,et al.  Bayesian Hierarchical Models With Conjugate Full-Conditional Distributions for Dependent Data From the Natural Exponential Family , 2017, 1701.07506.

[16]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[17]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[18]  Terrance D. Savitsky,et al.  Bayesian Estimation Under Informative Sampling , 2015, 1507.07050.