A Warning on the Use of Chi-Squared Statistics with Frequency Tables with Small Expected Cell Counts

Abstract When applied to frequency tables with small expected cell counts, Pearson chi-squared test statistics may be asymptotically inconsistent even in cases in which a satisfactory chi-squared approximation exists for the distribution under the null hypothesis. This problem is particularly important in cases in which the number of cells is large and the expected cell counts are quite variable. To illustrate this bias of the chi-squared test, this article considers the Pearson chi-squared test of the hypothesis that the cell probabilities for a multinomial frequency table have specified values. In this case, the expected value and variance of the Pearson chi-square may be evaluated under both the null and alternative hypotheses. When the number of cells is large, normal approximations and discrete Edgeworth expansions may also be used to assess the size and power of the Pearson chi-squared test. These analyses show that unless all cell probabilities are equal, it is possible to select a significance lev...

[1]  K. Koehler,et al.  An Empirical Investigation of Goodness-of-Fit Statistics for Sparse Multinomials , 1980 .

[2]  Irving John Good,et al.  Exact Distributions for χ2 and for the Likelihood-Ratio Statistic for the Equiprobable Multinomial Distribution , 1970 .

[3]  J. Haldane THE EXACT VALUE OF THE MOMENTS OF THE DISTRIBUTION OF x2 USED AS A TEST OF GOODNESS OF FIT, WHEN EXPECTATIONS ARE SMALL , 1937 .

[4]  James K. Yarnold,et al.  The Minimum Expectation in X 2 Goodness of Fit Tests and the Accuracy of Approximations for the Null Distribution , 1970 .

[5]  Carl N. Morris,et al.  CENTRAL LIMIT THEOREMS FOR MULTINOMIAL SUMS , 1975 .

[6]  D. Zelterman Goodness-of-Fit Tests for Large Sparse Multinomial Distributions , 1987 .

[7]  Timothy R. C. Read Small-Sample Comparisons for the Power Divergence Goodness-of-Fit Statistics , 1984 .

[8]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[9]  A. Cohen,et al.  Unbiasedness of the Chi-Square, Likelihood Ratio, and Other Goodness of Fit Tests for the Equal Cell Case , 1975 .

[10]  Jeffrey S. Simonoff,et al.  Jackknifing and Bootstrapping Goodness-of-Fit Statistics in Sparse Multinomials , 1986 .

[11]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[12]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[13]  D. Zelterman The log-likelihood ratio for sparse multinomial mixtures , 1986 .

[14]  R. R. Bahadur,et al.  On Deviations of the Sample Mean , 1960 .

[15]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .

[16]  P. A. P. Moran,et al.  An introduction to probability theory , 1968 .

[17]  Shelby J. Haberman,et al.  Tests for Independence in Two-Way Contingency Tables Based on Canonical Correlation and on Linear-By-Linear Interaction , 1981 .

[18]  Shelby J. Haberman,et al.  Log-Linear Models and Frequency Tables with Small Expected Cell Counts , 1977 .

[19]  P. Patnaik THE NON-CENTRAL χ2- AND F-DISTRIBUTIONS AND THEIR APPLICATIONS , 1949 .