论文信息 - Pearson's goodness-of-fit tests for sparse distributions - 字舞流文

Pearson's goodness-of-fit tests for sparse distributions

Abstract. Pearson’s chi-squared test is widely used to test the goodness of fit between categorical data and a given discrete distribution function. When the number of sets of the categorical data, say k, is a fixed integer, Pearson’s chi-squared test statistic converges in distribution to a chi-squared distribution with k − 1 degrees of freedom when the sample size n goes to infinity. In real applications, the number k often changes with n and may be even much larger than n. By using the martingale techniques, we prove that Pearson’s chi-squared test statistic converges to the normal under quite general conditions. We also propose a new test statistic which is more powerful than chi-squared test statistic based on our simulation study. A real application to lottery data is provided to illustrate our methodology.

Shuhua Chang | Deli Li | Yongcheng Qi | Deli Li | Y. Qi | Shuhua Chang

[1] D. Zelterman. The log-likelihood ratio for sparse multinomial mixtures , 1986 .

[2] S. Kh. Tumanyan,et al. Asymptotic Distribution of The $\chi ^2 $ Criterion when the Number of Observations and Number of Groups Increase Simultaneously , 1956 .

[3] Karl Pearson F.R.S.. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[4] Double asymptotics for the chi-square statistic. , 2016, Statistics & probability letters.

[5] Sung-Ho Kim,et al. Estimate-based goodness-of-fit test for large sparse multinomial distributions , 2009, Comput. Stat. Data Anal..

[6] A. Wald,et al. On the Choice of the Number of Class Intervals in the Application of the Chi Square Test , 1942 .

[7] W. G. Cochran. The $\chi^2$ Test of Goodness of Fit , 1952 .

[8] Geert Verbeke,et al. Testing variance components in balanced linear growth curve models , 2012 .

[9] P. Hall,et al. Martingale Limit Theory and Its Application , 1980 .

[10] Shelby J. Haberman,et al. A Warning on the Use of Chi-Squared Statistics with Frequency Tables with Small Expected Cell Counts , 1988 .

[11] K. Larntz. Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics , 1978 .

[12] Timothy Paul Hutchinson,et al. The validity of the chi-squared test when expected frequencies are small:A list of recent research references , 1979 .

[13] D. Zelterman. Goodness-of-Fit Tests for Large Sparse Multinomial Distributions , 1987 .

[14] A. Cohen,et al. Unbiasedness of the Chi-Square, Likelihood Ratio, and Other Goodness of Fit Tests for the Equal Cell Case , 1975 .

[15] Timothy R. C. Read,et al. Goodness-Of-Fit Statistics for Discrete Multivariate Data , 1988 .

[16] H. B. Lawal,et al. Tables of Percentage Points of Pearson's Goodness‐Of‐Fit Statistic for Use with Small Expectations , 1980 .

[17] K. Pearson. On the χ 2 Test of Goodness of Fit , 1922 .

[18] K. Koehler,et al. An Empirical Investigation of Goodness-of-Fit Statistics for Sparse Multinomials , 1980 .

[19] M. Pagano,et al. Methods for Exact Goodness-of-Fit Tests , 1992 .

[20] Lars Holst,et al. Asymptotic normality and efficiency for certain goodness-of-fit tests , 1972 .

[21] Timothy R. C. Read,et al. Pearsons-X2 and the loglikelihood ratio statistic-G2: a comparative review , 1989 .

[22] Timothy R. C. Read,et al. Pearson's X and the Loglikelihood Ratio Statistic G : A Comparative Review 2 , 2022 .