Goodness-of-Fit Tests for Large Sparse Multinomial Distributions

Abstract A goodness-of-fit statistic D 2 is introduced for use in multinomial distributions. Pearson's X2 and D 2 are both approximately normally distributed when the sample size N is not large relative to the number of multinomial categories k. Under sequences of local alternative hypotheses the test based on D 2 exhibits moderate power when the X2 test is biased. Application is made to the analysis of large sparse contingency tables. A theorem is presented that describes the likelihood ratio Λ for testing a simple multinomial distribution against a mixture of multinomial distributions. A wide variety of mixing distributions is considered, and the D 2 statistic is a special case of log Λ when testing for Dirichlet mixtures of multinomial distributions. In the case where N is large, relative to k, X2 and D 2 + k behave approximately as chi-squared random variables and differ by a very small amount under the null hypothesis. In this situation X2 and D 2 + k will generally yield the same inference. With a l...