Testing for Association in Contingency Tables with Multiple Column Responses

SUMMARY In many studies, multiple categorical responses or measurements are made on members of different populations or treatment groups. This arises often in surveys where individuals may mark all answers that apply when responding to a multiple-choice question. Frequently, it is of interest to determine whether the distributions of responses differ among groups. In this situation, the test statistic of the usual Pearson chi-square test no longer measures a scaled distance between observed and hypothesized cell counts in a contingency table, and its distribution is no longer the familiar chisquare. This paper presents a modification to the Pearson statistic that measures the appropriate distance for multiple-response tables. The asymptotic distribution is shown to be that of a linear combination of chi-square random variables with coefficients depending on the true probabilities. A bootstrap resampling method is proposed instead to obtain a null-hypothesis sampling distribution. Simulations show that this bootstrap method maintains its size under a variety of circumstances, while a naively applied Pearson chi-square test is severely affected by multiple responses.