A Bayesian test of independence in a two-way contingency table using surrogate sampling

We consider a Bayesian approach to the study of independence in a two-way contingency table which has been obtained from a two-stage cluster sampling design. If a procedure based on single-stage simple random sampling (rather than the appropriate cluster sampling) is used to test for independence, the p-value may be too small, resulting in a conclusion that the null hypothesis is false when it is, in fact, true. For many large complex surveys the Rao–Scott corrections to the standard chi-squared (or likelihood ratio) statistic provide appropriate inference. For smaller surveys, though, the Rao–Scott corrections may not be accurate, partly because the chi-squared test is inaccurate. In this paper, we use a hierarchical Bayesian model to convert the observed cluster samples to simple random samples. This provides surrogate samples which can be used to derive the distribution of the Bayes factor. We demonstrate the utility of our procedure using an example and also provide a simulation study which establishes our methodology as a viable alternative to the Rao–Scott approximations for relatively small two-stage cluster samples. We also show the additional insight gained by displaying the distribution of the Bayes factor rather than simply relying on a summary of the distribution.

[1]  J. Rao,et al.  Small-Sample Comparisons of Level and Power for Simple Goodness-of-Fit Statistics under Cluster Sampling , 1987 .

[2]  Qi Dong,et al.  Combining information from multiple complex surveys. , 2014, Survey methodology.

[3]  A. Singh,et al.  Tests of Independence on Two-Way Tables under Cluster Sampling: An Evaluation , 1996 .

[4]  E. Bedrick Adjusted chi-squared tests for cross-classified tables of survey data , 1983 .

[5]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[6]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[7]  R. Sugden,et al.  Ignorable and informative designs in survey sampling inference , 1984 .

[8]  Danny Pfeffermann,et al.  Modelling of complex survey data: Why model? Why is it a problem? How can we approach it? , 2011 .

[9]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[10]  A. Scott,et al.  The Effect of Two-Stage Sampling on Ordinary Least Squares Methods , 1982 .

[11]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[12]  A. Scott,et al.  Chi-squared Tests with Survey Data , 1980 .

[13]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[14]  S. Brier Analysis of contingency tables under cluster sampling , 1980 .

[15]  A. Scott,et al.  On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data , 1984 .

[16]  A. Scott,et al.  The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables , 1981 .

[17]  Balgobin Nandram,et al.  Alternative Tests of Independence in Two-Way Categorical Tables , 2021, Journal of Data Science.

[18]  Danny Pfeffermann,et al.  Inference under informative sampling , 2009 .

[19]  Umesh Singh,et al.  Bayesian Statistics And Its Applications , 2007 .

[20]  Richard Valliant,et al.  Finite population sampling and inference : a prediction approach , 2000 .

[21]  B. Nandram,et al.  A likelihood ratio test of quasi-independence for sparse two-way contingency tables , 2015 .