High-dimensional two-sample mean vectors test and support recovery with factor adjustment

Abstract Testing the equality of two mean vectors is a classical problem in multivariate analysis. In this article, we consider the test in the high-dimensional setting. Existing tests often assume that the covariance matrix (or its inverse) of the underlying variables is sparse, which is rarely true in social science due to the existence of latent common factors. In the article, we introduce a maximum-type test statistic based on the factor-adjusted data. The factor-adjustment step increases the signal-to-noise ratio and thus results in more powerful test. We obtain the limiting null distribution of the maximum-type test statistic, which is the extreme value distribution of type I. To overcome the well-known slow convergence rate of the test statistic’s distribution to the limiting extreme value distribution, we also propose a multiplier bootstrap method to improve the finite-sample performance. In addition, a multiple testing procedure with false discovery rate (FDR) control is proposed for identifying specific locations that differ significantly between the two groups. Thorough numerical studies are conducted to show the superiority of the test over other state-of-the-art tests. The performance of the test is also assessed through a real stock market dataset.

[1]  Wei Pan,et al.  An adaptive two-sample test for high-dimensional means , 2016, Biometrika.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Debashis Paul,et al.  A Regularized Hotelling’s T2 Test for Pathway Analysis in Proteomic Studies , 2011, Journal of the American Statistical Association.

[4]  W. Wu,et al.  On false discovery control under dependence , 2008, 0803.1971.

[5]  Jianqing Fan,et al.  Journal of the American Statistical Association Estimating False Discovery Proportion under Arbitrary Covariance Dependence Estimating False Discovery Proportion under Arbitrary Covariance Dependence , 2022 .

[6]  Yong He,et al.  Large-Dimensional Factor Analysis Without Moment Constraints , 2019, Journal of Business & Economic Statistics.

[7]  Kengo Kato,et al.  Comparison and anti-concentration bounds for maxima of Gaussian random vectors , 2013, 1301.4807.

[8]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[9]  T. Hastie,et al.  CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING. , 2015, Annals of statistics.

[10]  Jianqing Fan,et al.  High Dimensional Covariance Matrix Estimation in Approximate Factor Models , 2011, Annals of statistics.

[11]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[12]  Xinbing Kong On the number of common factors with high‐frequency data , 2017 .

[13]  David Ruppert,et al.  RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections , 2014, 1405.1792.

[14]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[15]  Han Liu,et al.  A Unified Framework for Testing High Dimensional Parameters: A Data-Adaptive Approach. , 2018, 1808.02648.

[16]  R. Cont Empirical properties of asset returns: stylized facts and statistical issues , 2001 .

[17]  Wen-Xin Zhou,et al.  Web-based Supplementary Materials for “Comparing Large Covariance Matrices under Weak Conditions on the Dependence Structure and its Application to Gene Clustering”, , 2016 .

[18]  T. Cai,et al.  Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings , 2013 .

[19]  Xinsheng Zhang,et al.  Robust Factor Number Specification for Large-dimensional Factor Model , 2018, 1808.09107.

[20]  E. Fama,et al.  Common risk factors in the returns on stocks and bonds , 1993 .

[21]  Jianqing Fan,et al.  Estimation of the false discovery proportion with unknown dependence , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[22]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[23]  Jun Li,et al.  Two-Sample Tests for High Dimensional Means with Thresholding and Data Transformation , 2014, 1410.2848.

[24]  Q. Shao,et al.  Phase Transition and Regularized Bootstrap in Large Scale $t$-tests with False Discovery Rate Control , 2013, 1310.4371.

[25]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[26]  Xin-Bing Kong,et al.  Testing against constant factor loading matrix with large panel high-frequency data , 2018, Journal of Econometrics.

[27]  W. Sharpe CAPITAL ASSET PRICES: A THEORY OF MARKET EQUILIBRIUM UNDER CONDITIONS OF RISK* , 1964 .

[28]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2013 .

[29]  Weidong Liu Gaussian graphical model estimation with false discovery rate control , 2013, 1306.0976.

[30]  Xinsheng Zhang,et al.  Adaptive test for mean vectors of high-dimensional time series data with factor structure , 2018, Journal of the Korean Statistical Society.

[31]  Zhi Liu,et al.  A rank test for the number of factors with high-frequency data , 2019, Journal of Econometrics.

[32]  Qiang Sun,et al.  FarmTest: Factor-Adjusted Robust Multiple Testing With Approximate False Discovery Control , 2017, Journal of the American Statistical Association.

[33]  Qiang Sun,et al.  FARM-Test: Factor-Adjusted Robust Multiple Testing with False Discovery Control , 2017 .

[34]  Jianqing Fan,et al.  LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS. , 2015, Annals of statistics.

[35]  Xin-Bing Kong,et al.  Testing of high dimensional mean vectors via approximate factor model , 2015 .

[36]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[37]  Muni S. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension under non-normality , 2009, J. Multivar. Anal..

[38]  M. Rothschild,et al.  Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1982 .

[39]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[40]  Weidong Liu,et al.  Two‐sample test of high dimensional means under dependence , 2014 .

[41]  Jiashun Jin,et al.  Robustness and accuracy of methods for high dimensional data analysis based on Student's t‐statistic , 2010, 1001.3886.

[42]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[43]  A. Onatski TESTING HYPOTHESES ABOUT THE NUMBER OF FACTORS IN LARGE FACTOR MODELS , 2009 .

[44]  Serena Ng,et al.  Are more data always better for factor analysis , 2006 .

[45]  J. Stock,et al.  Macroeconomic Forecasting Using Diffusion Indexes , 2002 .

[46]  R. Dudley,et al.  Uniform Central Limit Theorems: Notation Index , 2014 .

[47]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[48]  M. Srivastava,et al.  A test for the mean vector with fewer observations than the dimension , 2008 .

[49]  Seung C. Ahn,et al.  Eigenvalue Ratio Test for the Number of Factors , 2013 .

[50]  Kunpeng Li,et al.  STATISTICAL ANALYSIS OF FACTOR MODELS OF HIGH DIMENSION , 2012, 1205.6617.

[51]  Wen Zhou,et al.  Simulation‐based hypothesis testing of high dimensional means under covariance heterogeneity , 2014, Biometrics.

[52]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[53]  Qiwei Yao,et al.  Testing for high-dimensional white noise using maximum cross-correlations , 2016, 1608.02067.

[54]  Z. Bai,et al.  A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices , 2016, 1603.01003.