Bayesian Optimal Two-sample Tests in High-dimension

We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that only a few entries of two mean vectors or covariance matrices are different. Many existing tests that rely on aggregating the difference between empirical means or covariance matrices are not optimal or yield low power under such setups. Motivated by this, we develop Bayesian two-sample tests employing a divide-andconquer idea, which is powerful especially when the difference between two populations is sparse but large. The proposed two-sample tests manifest closed forms of Bayes factors and allow scalable computations even in high-dimensions. We prove that the proposed tests are consistent under relatively mild conditions compared to existing tests in the literature. Furthermore, the testable regions from the proposed tests turn out to be optimal in terms of rates. Simulation studies show clear advantages of the proposed tests over other state-of-the-art methods in various scenarios. Our tests are also applied to the analysis of the gene expression data of two cancer data sets.

[1]  Jovan D. Kečkić,et al.  Some Inequalities For The Gamma Function , 1971 .

[2]  Mladen Kolar,et al.  Marginal Regression For Multitask Learning , 2012, AISTATS.

[3]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[4]  Y. Baraud Non-asymptotic minimax rates of testing in signal detection , 2002 .

[5]  Jun Yu Li,et al.  Two Sample Tests for High Dimensional Covariance Matrices , 2012, 1206.0917.

[6]  Bani K. Mallick,et al.  A Powerful Bayesian Test for Equality of Means in High Dimensions , 2018, Journal of the American Statistical Association.

[7]  Weidong Liu,et al.  Two‐sample test of high dimensional means under dependence , 2014 .

[8]  Veerabhadran Baladandayuthapani,et al.  A Two-Sample Test for Equality of Means in High Dimension , 2015, Journal of the American Statistical Association.

[9]  Song-xi Chen,et al.  HIGH-DIMENSIONAL TWO-SAMPLE COVARIANCE MATRIX TESTING VIA SUPER-DIAGONALS , 2018 .

[10]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[11]  James R. Schott,et al.  A test for the equality of covariance matrices when the dimension is large relative to the sample sizes , 2007, Comput. Stat. Data Anal..

[12]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[13]  Nan Lin,et al.  Robust two-sample test of high-dimensional mean vectors under dependence , 2019, J. Multivar. Anal..

[14]  Jun Zhu,et al.  Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis , 2011, Comput. Stat. Data Anal..

[15]  James J. Chen,et al.  Multivariate analysis of variance test for gene set analysis , 2009, Bioinform..

[16]  Wei Pan,et al.  An adaptive two-sample test for high-dimensional means , 2016, Biometrika.

[17]  M. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension , 2008 .

[18]  Shu-rong Zheng,et al.  Testing homogeneity of high-dimensional covariance matrices , 2019, Statistica Sinica.

[19]  Pierpaolo Natalini,et al.  On Some Inequalities for the Gamma Function , 2013 .

[20]  L. M. M.-T. Theory of Probability , 1929, Nature.

[21]  Hirokazu Yanagihara,et al.  Testing the equality of several covariance matrices with fewer observations than the dimension , 2010, J. Multivar. Anal..

[22]  Hongzhe Li,et al.  Two‐sample tests of high‐dimensional means for compositional data , 2018 .

[23]  Weidong Liu,et al.  Adaptive Thresholding for Sparse Covariance Matrix Estimation , 2011, 1102.2237.

[24]  T. Cai,et al.  Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings , 2013 .

[25]  Two-sample tests of high-dimensional means for compositional data , 2017 .

[26]  Lizhen Lin,et al.  Maximum pairwise Bayes factors for covariance structure testing , 2021, Electronic Journal of Statistics.