Testing the equality of multivariate means when $$p>n$$ by combining the Hotelling and Simes tests

We propose a method of testing the shift between mean vectors of two multivariate Gaussian random variables in a high-dimensional setting incorporating the possible dependency and allowing $p > n$. This method is a combination of two well-known tests: the Hotelling test and the Simes test. The tests are integrated by sampling several dimensions at each iteration, testing each using the Hotelling test, and combining their results using the Simes test. We prove that this procedure is valid asymptotically. This procedure can be extended to handle non-equal covariance matrices by plugging in the appropriate extension of the Hotelling test. Using a simulation study, we show that the proposed test is advantageous over state-of-the-art tests in many scenarios and robust to violation of the Gaussian assumption.

[1]  Martin J. Wainwright,et al.  A More Powerful Two-Sample Test in High Dimensions using Random Projection , 2011, NIPS.

[2]  Sabine Weiss,et al.  Multivariate tests for the evaluation of high-dimensional EEG data , 2004, Journal of Neuroscience Methods.

[3]  Weidong Liu,et al.  Adaptive Thresholding for Sparse Covariance Matrix Estimation , 2011, 1102.2237.

[4]  Weidong Liu,et al.  Two‐sample test of high dimensional means under dependence , 2014 .

[5]  David Garcia-Dorado,et al.  Cariporide preserves mitochondrial proton gradient and delays ATP depletion in cardiomyocytes during ischemic conditions. , 2003, American journal of physiology. Heart and circulatory physiology.

[6]  M. Genton,et al.  Diagonal likelihood ratio test for equality of mean vectors in high‐dimensional data , 2017, Biometrics.

[7]  Wolfgang Stadje,et al.  THE COLLECTOR'S PROBLEM WITH GROUP DRAWINGS , 1990 .

[8]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[9]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[10]  Samuel Karlin,et al.  Total Positivity Properties of Absolute Value Multinormal Variables with Applications to Confidence Interval Estimates and Related Probabilistic Inequalities , 1981 .

[11]  A. Dempster A HIGH DIMENSIONAL TWO SAMPLE SIGNIFICANCE TEST , 1958 .

[12]  S. Sarkar Some probability inequalities for ordered $\rm MTP\sb 2$ random variables: a proof of the Simes conjecture , 1998 .

[13]  Composite $T^2$ test for high-dimensional data , 2018 .

[14]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[15]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[16]  Ruth Heller,et al.  Multivariate tests of association based on univariate tests , 2016, NIPS.

[17]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[18]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  K. Krishnamoorthy,et al.  Modified Nel and Van der Merwe test for the multivariate Behrens–Fisher problem , 2004 .

[20]  Daniel Yekutieli False discovery rate control for non-positively regression dependent test statistics , 2008 .

[21]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[22]  A. Reiner-Benaim FDR Control by the BH Procedure for Two‐Sided Correlated Tests with Implications to Gene Expression Data Analysis , 2007, Biometrical journal. Biometrische Zeitschrift.

[23]  S. Dudoit,et al.  Gains in Power from Structured Two-Sample Tests of Means on Graphs , 2010, 1009.5173.

[24]  H J Keselman,et al.  Controlling the rate of Type I error over a large set of statistical tests. , 2002, The British journal of mathematical and statistical psychology.

[25]  Shane A. Heiney,et al.  Chromatin remodeling inactivates activity genes and regulates neural coding , 2016, Science.

[26]  Veerabhadran Baladandayuthapani,et al.  A Two-Sample Test for Equality of Means in High Dimension , 2015, Journal of the American Statistical Association.

[27]  Andriy Derkach,et al.  Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results , 2012 .

[28]  N. Henze A MULTIVARIATE TWO-SAMPLE TEST BASED ON THE NUMBER OF NEAREST NEIGHBOR TYPE COINCIDENCES , 1988 .

[29]  Måns Thulin,et al.  A high-dimensional two-sample test for the mean using random subspaces , 2013, Comput. Stat. Data Anal..

[30]  Momiao Xiong,et al.  Generalized T2 test for genome association studies. , 2002, American journal of human genetics.

[31]  M. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension , 2008 .

[32]  John W. Tukey,et al.  Controlling Error in Multiple Comparisons, with Examples from State-to-State Differences in Educational Achievement , 1999 .