The split-up algorithm: a fast symbolic method for computing p-values of distribution-free statistics

Many distribution-free statistics have the drawback that computing exact p-values under the null hypothesis is an intensive task. When the sample sizes are small or the number of ties is large, approximations are often unsatisfactory. Moreover, tables of exact critical values are not available for conditional rank statistics (ties, censoring), for rank statistics with arbitrary regression constants, or for permutation test statistics. In those cases, it is important to have a fast algorithm for computing exact p-values. We present a new algorithm and apply it to a large class of distribution-free one-sample, two-sample and serial statistics. The algorithm is based on splitting the probability generating function of the test statistic into two parts. We compare the speed of this “split-up algorithm” to that of existing procedures and we conclude that our new algorithm is faster in many cases.