Nonparametric K-Sample Tests via Dynamic Slicing

K-sample testing problems arise in many scientific applications and have attracted statisticians’ attention for many years. We propose an omnibus nonparametric method based on an optimal discretization (aka “slicing”) of continuous random variables in the test. The novelty of our approach lies in the inclusion of a term penalizing the number of slices (i.e., the resolution of the discretization) so as to regularize the corresponding likelihood-ratio test statistic. An efficient dynamic programming algorithm is developed to determine the optimal slicing scheme. Asymptotic and finite-sample properties such as power and null distribution of the resulting test statistic are studied. We compare the proposed testing method with some existing well-known methods and demonstrate its statistical power through extensive simulation studies as well as a real data example. A dynamic slicing method for the one-sample testing problem is further developed and studied under the same framework. Supplementary materials including technical derivations and proofs are available online.

[1]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[2]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[3]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[4]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[5]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[6]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Tibshirani,et al.  Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 , 2014, 1401.7645.

[9]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[10]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[11]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[12]  H. Cramér On the composition of elementary errors , .

[13]  D. Siegmund,et al.  Maximally Selected Chi Square Statistics , 1982 .

[14]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[15]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[16]  Jun S. Liu,et al.  SLICED INVERSE REGRESSION WITH VARIABLE SELECTION AND INTERACTION DETECTION , 2013 .

[17]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[18]  Malka Gorfine,et al.  Comment on “ Detecting Novel Associations in Large Data Sets ” , 2012 .

[19]  M. Stephens,et al.  K-Sample Anderson–Darling Tests , 1987 .

[20]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[21]  D. Brillinger,et al.  Handbook of methods of applied statistics , 1967 .

[22]  A. Martin-Löf On the composition of elementary errors , 1994 .

[23]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .