A Bayesian analysis of a proportion under non‐ignorable non‐response

The National Health Interview Survey (NHIS) is one of the surveys used to assess one aspect of the health status of the U.S. population. One indicator of the nation's health is the total number of doctor visits made by the household members in the past year. We study the binary variable of at least one doctor visit versus no doctor visit by all household members to each of the 50 states and the District of Columbia. The proportion of households with at least one doctor visit is an indicator of the status of health of the U.S. population. There is a substantial number of non-respondents among the sampled households. The main issue we address here is that the non-response mechanism should not be ignored because respondents and non-respondents differ. The purpose of this work is to estimate the proportion of households with at least one doctor visit, and to investigate what adjustment needs to be made for non-ignorable non-response. We consider a non-ignorable non-response model that expresses uncertainty about ignorability through the ratio of odds of a household doctor visit among respondents to the odds of doctor visit among all households, and this ratio varies from state to state. We use a hierarchical Bayesian selection model to accommodate this non-response mechanism. Because of the weak identifiability of the parameters, it is necessary to 'borrow strength' across states as in small area estimation. We also perform a simulation study to compare the expansion model with an alternative expansion model, an ignorable model and a non-ignorable model. Inference for the probability of a doctor visit is generally similar across the models. Our main result is that for some of the states the non-response mechanism can be considered non-ignorable, and that 95 per cent credible intervals of the probability for a household doctor visit and the probability that a household responds shed important light on the NHIS data.

[1]  M R Conaway Causal nonresponse models for repeated categorical measurements. , 1994, Biometrics.

[2]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[3]  Jonathan J. Forster,et al.  Model‐based inference for categorical survey data subject to non‐ignorable non‐response , 1998 .

[4]  Hong Chang,et al.  Model Determination Using Predictive Distributions with Implementation via Sampling-Based Methods , 1992 .

[5]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[6]  J. Heckman The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models , 1976 .

[7]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[8]  Michael J. Phillips Contingency tables with missing data , 1993 .

[9]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[10]  Robert M. Groves,et al.  Nonresponse in Household Interview Surveys: Groves/Nonresponse , 1998 .

[11]  S G Baker,et al.  Marginal regression for repeated binary data with outcome subject to non-ignorable non-response. , 1995, Biometrics.

[12]  E. Ziegel,et al.  Nonresponse In Household Interview Surveys , 1998 .

[13]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[14]  Joseph B. Kadane,et al.  Subjective Bayesian analysis for surveys with missing data , 1993 .

[15]  Erik V. Nordheim,et al.  Inference from Nonrandomly Missing Categorical Data: An Example from a Genetic Study on Turner's Syndrome , 1984 .

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  R. Little Models for Nonresponse in Sample Surveys , 1982 .

[18]  Elizabeth A. Stasny,et al.  Hierarchical Models for the Probabilities of a Survey Classification and Nonresponse: An Example from the National Crime Survey , 1991 .

[19]  R. Olsen,et al.  A Least Squares Correction for Selectivity Bias , 1980 .

[20]  S. Lipsitz,et al.  Weighted least squares analysis of repeated categorical measurements with outcomes subject to nonresponse. , 1994, Biometrics.

[21]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[22]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[23]  Mark R. Conaway,et al.  The Analysis of Repeated Categorical Measurements Subject to Nonignorable Nonresponse , 1992 .

[24]  David Lindley,et al.  Bayes Empirical Bayes , 1981 .

[25]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[26]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .