A goodness of fit test for sparse 2p contingency tables.

When a model is fitted to data in a 2p contingency table many cells are likely to have very small expected frequencies. This sparseness invalidates the usual approximation to the distribution of the chi-squared or log-likelihood tests of goodness of fit. We present a solution to this problem by proposing a test based on a comparison of the observed and expected frequencies of the second-order margins of the table. A chi2 approximation to the sampling distribution is provided using asymptotic moments. This can be straightforwardly calculated from the expected cell frequencies. The new test is applied to several previously published examples relating to the fitting of latent variable models, but its application is quite general.