Testing for the presence of significant covariates through conditional marginal regression

Summary Researchers sometimes have a priori information on the relative importance of predictors that can be used to screen out covariates. An important question is whether any of the discarded covariates have predictive power when the most relevant predictors are included in the model. We consider testing whether any discarded covariate is significant conditional on some pre-chosen covariates. We propose a maximum-type test statistic and show that it has a nonstandard asymptotic distribution, giving rise to the conditional adaptive resampling test. To accommodate signals of unknown sparsity, we develop a hybrid test statistic, which is a weighted average of maximum- and sum-type statistics. We prove the consistency of the test procedure under general assumptions and illustrate how it can be used as a stopping rule in forward regression. We show, through simulation, that the proposed method provides adequate control of the familywise error rate with competitive power for both sparse and dense signals, even in high-dimensional cases, and we demonstrate its advantages in cases where the covariates are heavily correlated. We illustrate the application of our method by analysing an expression quantitative trait locus dataset.

[1]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[2]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[3]  A. Singleton,et al.  Genetic variability in the regulation of gene expression in ten regions of the human brain , 2014, Nature Neuroscience.

[4]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[5]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[6]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[7]  S. Lahiri,et al.  Bootstrapping Lasso Estimators , 2011 .

[8]  I. McKeague,et al.  An Adaptive Resampling Test for Detecting the Presence of Significant Predictors , 2015, Journal of the American Statistical Association.

[9]  Jianqing Fan,et al.  Power Enhancement in High Dimensional Cross-Sectional Tests , 2013, Econometrica : journal of the Econometric Society.

[10]  Chih-Ling Tsai,et al.  Testing covariates in high-dimensional regression , 2013, Annals of the Institute of Statistical Mathematics.

[11]  Victor Chernozhukov,et al.  Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems , 2013, 1304.0282.

[12]  Roger W. Johnson,et al.  An Introduction to the Bootstrap , 2001 .

[13]  A. Ramasamy,et al.  Quality control parameters on a large dataset of regionally dissected human control brains for whole genome expression studies , 2011, Journal of neurochemistry.

[14]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[17]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[18]  Xu Cheng Robust inference in nonlinear models with mixed identification strength , 2015 .

[19]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[20]  N Clarke,et al.  Genome-wide methylation analysis identifies epigenetically inactivated candidate tumour suppressor genes in renal cell carcinoma , 2011, Oncogene.

[21]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[22]  Jianqing Fan,et al.  Conditional Sure Independence Screening , 2012, Journal of the American Statistical Association.