Tests for high dimensional data based on means, spatial signs and spatial ranks

Tests based on sample mean vectors and sample spatial signs have been studied in the recent literature for high dimensional data with the dimension larger than the sample size. For suitable sequences of alternatives, we show that the powers of the mean based tests and the tests based on spatial signs and ranks tend to be same as the data dimension grows to infinity for any sample size, when the coordinate variables satisfy appropriate mixing conditions. Further, their limiting powers do not depend on the heaviness of the tails of the distributions. This is in striking contrast to the asymptotic results obtained in the classical multivariate setup. On the other hand, we show that in the presence of stronger dependence among the coordinate variables, the spatial sign and rank based tests for high dimensional data can be asymptotically more powerful than the mean based tests if in addition to the data dimension, the sample size also grows to infinity. The sizes of some mean based tests for high dimensional data studied in the recent literature are observed to be significantly different from their nominal levels. This is due to the inadequacy of the asymptotic approximations used for the distributions of those test statistics. However, our asymptotic approximations for the tests based on spatial signs and ranks are observed to work well when the tests are applied on a variety of simulated and real datasets.

[1]  O. Kallenberg Probabilistic Symmetries and Invariance Principles , 2005 .

[2]  Hannu Oja,et al.  ON THE EFFICIENCY OF MULTIVARIATE SPATIAL SIGN AND RANK TESTS , 1997 .

[3]  Zhengyan Lin,et al.  Limit Theory for Mixing Dependent Random Variables , 1997 .

[4]  A. Kolmogorov,et al.  On Strong Mixing Conditions for Stationary Gaussian Processes , 1960 .

[5]  J. S. Marron,et al.  Direction-Projection-Permutation for High-Dimensional Hypothesis Tests , 2013, 1304.0796.

[6]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[7]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[8]  Lixing Zhu,et al.  TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA , 2015 .

[9]  Veerabhadran Baladandayuthapani,et al.  A Two-Sample Test for Equality of Means in High Dimension , 2015, Journal of the American Statistical Association.

[10]  Muni S. Srivastava,et al.  A two sample test in high dimensional data , 2013, Journal of Multivariate Analysis.

[11]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[12]  P. Sen,et al.  Nonparametric methods in multivariate analysis , 1974 .

[13]  H. Oja Multivariate Nonparametric Methods with R , 2010 .

[14]  Y. Kano,et al.  A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix , 2014 .

[15]  Hannu Oja,et al.  Multivariate spatial sign and rank methods , 1995 .

[16]  T. Hettmansperger,et al.  Robust Nonparametric Statistical Methods , 1998 .

[17]  Runze Li,et al.  A High-Dimensional Nonparametric Multivariate Test for Mean Vector , 2015, Journal of the American Statistical Association.

[18]  Jianqing Fan,et al.  Test of Significance When Data Are Curves , 1998 .

[19]  J. Marden,et al.  An Approach to Multivariate Rank Tests in Multivariate Analysis of Variance , 1997 .

[20]  I. Ibragimov,et al.  Independent and stationary sequences of random variables , 1971 .

[21]  Weidong Liu,et al.  Two‐sample test of high dimensional means under dependence , 2014 .