χ 2 and classical exact tests often wildly misreport significance ; the remedy lies in computers

If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors — as the present article illustrates via numerous examples. Fortunately, with the now widespread availability of computers, avoiding all the trouble is simple and easy: without the problematic division by nearly zero, the actual values taken by goodnessof-fit statistics are not humanly interpretable, but black-box computer programs can rapidly calculate their precise significance. Supported in part by NSF Grant OISE-0730136 and an NSF Postdoctoral Research Fellowship Supported in part by a Research Fellowship from the Alfred P. Sloan Foundation Supported in part by an NSF Postdoctoral Research Fellowship and a Donald D. Harrington Faculty Fellowship 1

[1]  M. J. Bayarri,et al.  P Values for Composite Null Models , 2000 .

[2]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[3]  Calyampudi R. Rao,et al.  Karl Pearson Chi-Square Test The Dawn of Statistical Inference , 2002 .

[4]  Mark Tygert,et al.  Computing the confidence levels for a root-mean-square test of goodness-of-fit , 2010, Appl. Math. Comput..

[5]  E. Thompson,et al.  Performing the exact test of Hardy-Weinberg proportion for multiple alleles. , 1992, Biometrics.

[6]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[7]  Student,et al.  ON THE ERROR OF COUNTING WITH A HAEMACYTOMETER , 1907 .

[8]  Martin Greenberger,et al.  Random number generators , 1959, ACM National Meeting.

[9]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .

[10]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[11]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[12]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[13]  H. Bateman,et al.  LXXVI. The probability variations in the distribution of α particles , 1910 .

[14]  Thomas M. Stoker,et al.  Tailor-made tests for goodness of fit to semiparametric hypotheses , 2006, math/0607014.

[15]  Larry Wasserman,et al.  All of Statistics , 2004 .

[16]  Ralph B. D'Agostino,et al.  Goodness-of-Fit-Techniques , 2020 .

[17]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[18]  James M. Robins,et al.  Asymptotic Distribution of P Values in Composite Null Models , 2000 .

[19]  E. Erosheva,et al.  Self-Rated Health among Foreign- and U.S.-Born Asian Americans: A Test of Comparability , 2007, Medical care.