New statistics to test log-linear modeling hypothesis with no distributional specifications and clusters with homogeneous correlation

Abstract Traditionally, the Dirichlet-multinomial distribution has been recognized as a key model for contingency tables generated by cluster sampling schemes. There are, however, other possible distributions appropriate for these contingency tables. This paper introduces new statistics capable of testing log-linear modeling hypotheses with distributional unspecification, when the individuals of the clusters are possibly homogeneously correlated. An estimator for the intracluster correlation coefficient, valid for different cluster sizes, plays a crucial role in the construction of the goodness-of-fit test-statistics.

[1]  J. Mosimann On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions , 1962 .

[2]  I. Fellegi Approximate Tests of Independence and Goodness of Fit Based on Stratified Multistage Samples , 1980 .

[3]  Gary G. Koch,et al.  Strategies in the Multivariate Analysis of Data from Complex Surveys , 1975 .

[4]  Yiliang Zhu Correlated Multinomial Data , 2014 .

[5]  Robert E. Fay,et al.  A Jackknifed Chi-Squared Test for Complex Samples , 1985 .

[6]  L. Pardo,et al.  New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes , 2017, Stat. Comput..

[7]  A. Scott,et al.  On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data , 1984 .

[8]  Leandro Pardo,et al.  Divergence-based estimation and testing of statistical models of classification , 1995 .

[9]  P. Altham Discrete variable analysis for individuals grouped into families , 1976 .

[10]  Yiliang Zhu Correlated Multinomial DataBased in part on the article “Correlated multinomial data” by Yiliang Zhu, which appeared in the Encyclopedia of Environmetrics. , 2013 .

[11]  Leandro Pardo,et al.  Phi‐Divergence Statistic , 2006 .

[12]  A. Scott,et al.  Chi-squared Tests with Survey Data , 1980 .

[13]  Andrew Copas,et al.  Methods for sample size determination in cluster randomized trials , 2015, International journal of epidemiology.

[14]  E. Bedrick Adjusted chi-squared tests for cross-classified tables of survey data , 1983 .

[15]  A. Scott,et al.  The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables , 1981 .

[16]  Joel E. Cohen,et al.  The Distribution of the Chi-Squared Statistic under Clustered Sampling from Contingency Tables , 1976 .

[17]  Majnu John,et al.  Derivation of Sample Size Formula for Cluster Randomized Trials with Binary Responses Using a General Continuity Correction Factor and Identification of Optimal Settings for Small Event Rates , 2021, Journal of Data Science.

[18]  Leandro Pardo,et al.  About divergence-based goodness-of-fit tests in the dirichlet-multinomial model , 1996 .

[19]  R. McHugh,et al.  A reduction factor in goodness-of-fit and independence tests for clustered and weighted observations. , 1989, Biometrics.

[20]  Z. Q. Lu Statistical Inference Based on Divergence Measures , 2007 .

[21]  John B. Carlin,et al.  The Intra‐Cluster Correlation Coefficient in Cluster Randomized Trials: A Review of Definitions , 2009 .

[22]  Daniel Krewski,et al.  Dose-Response Models for Correlated Multinomial Data from Developmental Toxicity Studies , 1994 .

[23]  J. G. Morel,et al.  A finite mixture distribution for modelling multinomial extra variation , 1993 .

[24]  S. Brier Analysis of contingency tables under cluster sampling , 1980 .