Generalized Spacing-Statistics and a New Family of Non-Parametric Tests

Random divisions of an interval arise in various context, including statistics, physics, and geometric analysis. For testing the uniformity of a random partition of the unit interval $[0,1]$ into $k$ disjoint subintervals of size $(S_k[1],\ldots,S_k[k])$, Greenwood (1946) suggested using the squared $\ell_2$-norm of this size vector as a test statistic, prompting a number of subsequent studies. Despite much progress on understanding its power and asymptotic properties, attempts to find its exact distribution have succeeded so far for only small values of $k$. Here, we develop an efficient method to compute the distribution of the Greenwood statistic and more general spacing-statistics for an arbitrary value of $k$. Specifically, we consider random divisions of $\{1,2,\dots,n\}$ into $k$ subsets of consecutive integers and study $\|S_{n,k}\|^p_{p,w}$, the $p$th power of the weighted $\ell_p$-norm of the subset size vector $S_{n,k}=(S_{n,k}[1],\ldots,S_{n,k}[k])$ for arbitrary weights $w=(w_1,\ldots,w_k)$. We present an exact and quickly computable formula for its moments, as well as a simple algorithm to accurately reconstruct a probability distribution using the moment sequence. We also study various scaling limits, one of which corresponds to the Greenwood statistic in the case of $p=2$ and $w=(1,\ldots,1)$, and this connection allows us to obtain information about regularity, monotonicity and local behavior of its distribution. Lastly, we devise a new family of non-parametric tests using $\|S_{n,k}\|^p_{p,w}$ and demonstrate that they exhibit substantially improved power for a large class of alternatives, compared to existing popular methods such as the Kolmogorov-Smirnov, Cramer-von Mises, and Mann-Whitney/Wilcoxon rank-sum tests.

[1]  THE RANDOM DIVISION OF AN INTERVAL-PART III , 1951 .

[2]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[3]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[4]  P. Burrows Selected Percentage Points of Greenwood's Statistic , 1979 .

[5]  Å. Björck,et al.  Solution of Vandermonde Systems of Equations , 1970 .

[6]  D. Darling On a Class of Problems Related to the Random Division of an Interval , 1953 .

[7]  L. Holst A UNIFIED APPROACH TO LIMIT THEOREMS FOR URN MODELS , 1979 .

[8]  J. Norris Appendix: probability and measure , 1997 .

[9]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[10]  Concentration on the ` np ball , .

[11]  Jonathan Terhorst,et al.  High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability , 2018, Nature Genetics.

[12]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[13]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[14]  E. Ehrhardt,et al.  Sur un problème de géométrie diophantienne linéaire. II. , 1967 .

[15]  Lars Holst,et al.  Asymptotic Theory for some Families of Two-Sample Nonparametric Statistics. , 1976 .

[16]  L. Mead,et al.  Maximum entropy in the problem of moments , 1984 .

[17]  N. Smirnov Table for Estimating the Goodness of Fit of Empirical Distributions , 1948 .

[18]  M GREENWOOD,et al.  The statistical study of infectious diseases. , 1946, Journal of the Royal Statistical Society. Series A.

[19]  Jorge Bustamante,et al.  Best one-sided L1 approximation to the Heaviside and sign functions , 2012, J. Approx. Theory.

[20]  Amanda Clare,et al.  Locational distribution of gene functional classes in Arabidopsis thaliana , 2007, BMC Bioinformatics.

[21]  W. Dixon A Criterion for Testing the Hypothesis that Two Samples are from the Same Population , 1940 .

[22]  I. Currie Further Percentage Points of Greenwood's Statistic , 1981 .

[23]  AN Kolmogorov-Smirnov,et al.  Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[24]  N. Akhiezer,et al.  The Classical Moment Problem and Some Related Questions in Analysis , 2020 .

[25]  M. A. Stephens Further Percentage Points for Greenwood's Statistic. , 1981 .

[26]  A. Martin-Löf On the composition of elementary errors , 1994 .

[27]  L. Weiss A Certain Class of Tests of Fit , 1956 .

[28]  A. Gardner GREENWOOD'S "PROBLEM OF INTERVALS": AN EXACT SOLUTION FOR N = 3 , 1952 .

[29]  Algemeen Gedeelte Random division of an interval* , 2007 .

[30]  H. Cramér On the composition of elementary errors , .