Pearson's goodness-of-fit tests for sparse distributions

Abstract. Pearson’s chi-squared test is widely used to test the goodness of fit between categorical data and a given discrete distribution function. When the number of sets of the categorical data, say k, is a fixed integer, Pearson’s chi-squared test statistic converges in distribution to a chi-squared distribution with k − 1 degrees of freedom when the sample size n goes to infinity. In real applications, the number k often changes with n and may be even much larger than n. By using the martingale techniques, we prove that Pearson’s chi-squared test statistic converges to the normal under quite general conditions. We also propose a new test statistic which is more powerful than chi-squared test statistic based on our simulation study. A real application to lottery data is provided to illustrate our methodology.

[1]  D. Zelterman The log-likelihood ratio for sparse multinomial mixtures , 1986 .

[2]  S. Kh. Tumanyan,et al.  Asymptotic Distribution of The $\chi ^2 $ Criterion when the Number of Observations and Number of Groups Increase Simultaneously , 1956 .

[3]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[4]  Double asymptotics for the chi-square statistic. , 2016, Statistics & probability letters.

[5]  Sung-Ho Kim,et al.  Estimate-based goodness-of-fit test for large sparse multinomial distributions , 2009, Comput. Stat. Data Anal..

[6]  A. Wald,et al.  On the Choice of the Number of Class Intervals in the Application of the Chi Square Test , 1942 .

[7]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[8]  Geert Verbeke,et al.  Testing variance components in balanced linear growth curve models , 2012 .

[9]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[10]  Shelby J. Haberman,et al.  A Warning on the Use of Chi-Squared Statistics with Frequency Tables with Small Expected Cell Counts , 1988 .

[11]  K. Larntz Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics , 1978 .

[12]  Timothy Paul Hutchinson,et al.  The validity of the chi-squared test when expected frequencies are small:A list of recent research references , 1979 .

[13]  D. Zelterman Goodness-of-Fit Tests for Large Sparse Multinomial Distributions , 1987 .

[14]  A. Cohen,et al.  Unbiasedness of the Chi-Square, Likelihood Ratio, and Other Goodness of Fit Tests for the Equal Cell Case , 1975 .

[15]  Timothy R. C. Read,et al.  Goodness-Of-Fit Statistics for Discrete Multivariate Data , 1988 .

[16]  H. B. Lawal,et al.  Tables of Percentage Points of Pearson's Goodness‐Of‐Fit Statistic for Use with Small Expectations , 1980 .

[17]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[18]  K. Koehler,et al.  An Empirical Investigation of Goodness-of-Fit Statistics for Sparse Multinomials , 1980 .

[19]  M. Pagano,et al.  Methods for Exact Goodness-of-Fit Tests , 1992 .

[20]  Lars Holst,et al.  Asymptotic normality and efficiency for certain goodness-of-fit tests , 1972 .

[21]  Timothy R. C. Read,et al.  Pearsons-X2 and the loglikelihood ratio statistic-G2: a comparative review , 1989 .

[22]  Timothy R. C. Read,et al.  Pearson's X and the Loglikelihood Ratio Statistic G : A Comparative Review 2 , 2022 .