Maximum-type tests for high-dimensional regression coefficients using Wilcoxon scores

Abstract In this article, we develop new maximum-type tests to infer the overall significance of coefficients in high-dimensional linear models based on the Wilcoxon scores. The proposed testing procedures are free of error variance estimation and robust to heavy-tailed distributions and outliers, making them widely applicable in practice. We incorporate the dependence structure among predictors in the test statistics to enhance their powers. The limiting null distributions of the test statistics are derived to be the extreme value distribution of type I under regularity conditions. To reduce the size distortion, we further propose a multiplier bootstrap method based on the high-dimensional Gaussian approximations, which does not impose any structural assumptions on the unknown covariance matrices. We also evaluate the powers of proposed tests theoretically in comparison with two existing methods. The effectiveness of our proposed tests in the finite samples is illustrated through simulation studies and a real data application.

[1]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[2]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[3]  Liping Zhu,et al.  Model-Free Feature Screening for Ultrahigh Dimensional Data through a Modified Blum-Kiefer-Rosenblatt Correlation , 2018 .

[4]  U. Einmahl,et al.  Characterization of LIL behavior in Banach space , 2006, math/0608687.

[5]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[6]  Jelle J. Goeman,et al.  Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control , 2011 .

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  Sara van de Geer,et al.  Testing against a high dimensional alternative , 2006 .

[9]  I. Kapetanovic,et al.  Subchronic toxicity and toxicogenomic evaluation of tamoxifen citrate + bexarotene in female rats. , 2007, Toxicological sciences : an official journal of the Society of Toxicology.

[10]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[11]  Jun Xie,et al.  POWERFUL TEST BASED ON CONDITIONAL EFFECTS FOR GENOME-WIDE SCREENING. , 2018, The annals of applied statistics.

[12]  R. Adamczak A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2007, 0709.3110.

[13]  Kengo Kato,et al.  Central limit theorems and bootstrap in high dimensions , 2014, 1412.3661.

[14]  P. Hall,et al.  Innovated Higher Criticism for Detecting Sparse Signals in Correlated Noise , 2009, 0902.3837.

[15]  Bin Chen,et al.  Rank-based score tests for high-dimensional regression coefficients , 2013 .

[16]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[17]  E. Candès,et al.  Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism , 2010, 1007.1434.

[18]  Song-xi Chen,et al.  Tests for high dimensional generalized linear models , 2014, 1402.4882.

[19]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[20]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[21]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[22]  Jun Xie,et al.  Accurate and Efficient P-value Calculation Via Gaussian Approximation: A Novel Monte-Carlo Method , 2018, Journal of the American Statistical Association.

[23]  Judy H. Cho,et al.  Comparisons of multi‐marker association methods to detect association between a candidate region and disease , 2010, Genetic epidemiology.

[24]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[25]  L. H. C. Tippett The Methods of Statistics. , 1931 .

[26]  Song-xi Chen,et al.  Tests for High-Dimensional Regression Coefficients With Factorial Designs , 2011 .

[27]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[28]  E. Levina,et al.  Discovering Sparse Covariance Structures With the Isomap , 2009 .

[29]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[30]  M. Friendly Corrgrams , 2002 .

[31]  Thomas L Casavant,et al.  Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Douglas H. Jones,et al.  Goodness-of-fit test statistics that dominate the Kolmogorov statistics , 1979 .

[33]  Philip S Rosenberg,et al.  Resampling‐based multiple hypothesis testing procedures for genetic case‐control association studies , 2006, Genetic epidemiology.

[34]  Guang Cheng,et al.  Simultaneous Inference for High-Dimensional Linear Models , 2016, 1603.01295.

[35]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[36]  T. Hettmansperger,et al.  Robust Nonparametric Statistical Methods , 1998 .

[37]  M. R. Leadbetter,et al.  Extremes and Related Properties of Random Sequences and Processes: Springer Series in Statistics , 1983 .

[38]  Xiaofeng Shao,et al.  Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening , 2014 .

[39]  Test for high-dimensional regression coefficients using refitted cross-validation variance estimation , 2018, The Annals of Statistics.

[40]  Weidong Liu,et al.  Two‐sample test of high dimensional means under dependence , 2014 .

[41]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .