Statistical test for the comparison of samples from mutational spectra

The Monte Carlo estimate of the p value of the hypergeometric test is described and advocated for the testing of the hypothesis that different treatments induce the same mutational spectrum. The hypergeometric test is a generalization of Fisher's "exact" test for tables with more than two rows and two columns. Use of the test is demonstrated by the analysis of data from the characterization of nonsense mutations in the lacI gene of Escherichia coli. Unlike the chi-square test, the hypergeometric test remains valid when applied to sparse cross-classification tables. The hypergeometric test has the most discrimination power of any statistical test that could be employed routinely to compare samples from mutational spectra. Direct application of the hypergeometric test to large cross-classification tables is excessively computation intensive, but estimation of its p value via Monte Carlo techniques is practical.