A nonparametric two-sample test applicable to high dimensional data

The multivariate two-sample testing problem has been well investigated in the literature, and several parametric and nonparametric methods are available for it. However, most of these two-sample tests perform poorly for high dimensional data, and many of them are not applicable when the dimension of the data exceeds the sample size. In this article, we propose a multivariate two-sample test that can be conveniently used in the high dimension low sample size setup. Asymptotic results on the power properties of our proposed test are derived when the sample size remains fixed, and the dimension of the data grows to infinity. We investigate the performance of this test on several high-dimensional simulated and real data sets, and demonstrate its superiority over several other existing two-sample tests. We also study some theoretical properties of the proposed test for situations when the dimension of the data remains fixed and the sample size tends to infinity. In such cases, it turns out to be asymptotically distribution-free and consistent under general alternatives.

[1]  P. Rosenbaum An exact distribution‐free test comparing two multivariate distributions based on adjacency , 2005 .

[2]  J. Marden,et al.  An Approach to Multivariate Rank Tests in Multivariate Analysis of Variance , 1997 .

[3]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[4]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[5]  Hannu Oja,et al.  Multivariate spatial sign and rank methods , 1995 .

[6]  Hannu Oja,et al.  AFFINE INVARIANT MULTIVARIATE RANK TESTS FOR SEVERAL SAMPLES , 1998 .

[7]  G. Zech,et al.  A Multivariate Two-Sample Test Based on the Concept of Minimum Energy , 2003 .

[8]  Hannu Oja,et al.  Multivariate Nonparametric Tests , 2004 .

[9]  P. Sen,et al.  Nonparametric methods in multivariate analysis , 1974 .

[10]  Hannu Oja,et al.  Affine Invariant Multivariate One‐Sample Sign Tests , 1994 .

[11]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[12]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[13]  H. Oja Multivariate Nonparametric Methods with R: An approach based on spatial signs and ranks , 2010 .

[14]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[15]  M. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension , 2008 .

[16]  Xinyi Xu,et al.  Optimal Nonbipartite Matching and Its Statistical Applications , 2011, The American statistician.

[17]  Dietmar Ferger,et al.  Optimal Tests for the General Two-Sample Problem , 2000 .

[18]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[19]  Regina Y. Liu,et al.  A Quality Index Based on Data Depth and Multivariate Rank Tests , 1993 .

[20]  Reza Modarres,et al.  A triangle test for equality of distribution functions in high dimensions , 2011 .

[21]  D. Andrews Laws of Large Numbers for Dependent Non-Identically Distributed Random Variables , 1988, Econometric Theory.

[22]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[23]  Ronald H. Randles,et al.  Multivariate rank tests for the two-sample location problem , 1990 .

[24]  Muni S. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension under non-normality , 2009, J. Multivar. Anal..

[25]  N. Henze A MULTIVARIATE TWO-SAMPLE TEST BASED ON THE NUMBER OF NEAREST NEIGHBOR TYPE COINCIDENCES , 1988 .

[26]  Valentin Rousson,et al.  On Distribution-Free Tests for the Multivariate Two-Sample Location-Scale Model , 2002 .

[27]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[28]  N. Henze,et al.  On the multivariate runs test , 1999 .

[29]  Stefun D. Leigh U-Statistics Theory and Practice , 1992 .

[30]  Hannu Oja,et al.  Affine Invariant Multivariate Multisample Sign Tests , 1994 .

[31]  R. Bartoszynski,et al.  Reducing multidimensional two-sample data to one-dimensional interpoint comparisons , 1996 .

[32]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.