TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION

We propose a new nonparametric test for equality of two or more multivariate distributions based on Euclidean distance between sample elements. Several consistent tests for comparing multivariate distributions can be developed from the underlying theoretical results. The test procedure for the multisample problem is developed and applied for testing the composite hypothesis of equal distributions, when distributions are unspecified. The proposed test is universally consistent against all fixed alternatives (not necessarily continuous) with finite second moments. The test is implemented by conditioning on the pooled sample to obtain an approximate permutation test, which is distribution free. Our Monte Carlo power study suggests that the new test may be much more sensitive than tests based on nearest neighbors against several classes of alternatives, and performs particularly well in high dimension. Computational complexity of our test procedure is independent of dimension and number of populations sampled. The test is applied in a high dimensional problem, testing microarray data from cancer samples.

[1]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[2]  Maria L. Rizzo,et al.  A new test for multivariate normality , 2005 .

[3]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[4]  Maria L. Rizzo,et al.  Mean distance test of Poisson distribution , 2004 .

[5]  M. L. Rizzo A New Rotation Invariant Goodness-of-Fit Test , 2002 .

[6]  G. Székely,et al.  A CHARACTERISTIC MEASURE OF ASYMMETRY AND ITS APPLICATION FOR TESTING DIAGONAL SYMMETRY , 2001 .

[7]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[8]  B. Efron Statistics in the 21st century , 1993 .

[9]  N. Henze A MULTIVARIATE TWO-SAMPLE TEST BASED ON THE NUMBER OF NEAREST NEIGHBOR TYPE COINCIDENCES , 1988 .

[10]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[11]  S. Janson The asymptotic distributions of incomplete U-statistics , 1984 .

[12]  P. Bickel,et al.  Sums of Functions of Nearest Neighbor Distances, Moment Bounds, Limit Theorems and a Goodness of Fit Test , 1983 .

[13]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[14]  G. Neuhaus Functional limit theorems for U-statistics in the degenerate case , 1977 .

[15]  R. Goodstein Contests in Higher Mathematics , 1970, The Mathematical Gazette.

[16]  P. Bickel A Distribution Free Version of the Smirnov Two Sample Test in the $p$-Variate Case , 1969 .

[17]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .