Testing covariates in high-dimensional regression

In a high-dimensional linear regression model, we propose a new procedure for testing statistical significance of a subset of regression coefficients. Specifically, we employ the partial covariances between the response variable and the tested covariates to obtain a test statistic. The resulting test is applicable even if the predictor dimension is much larger than the sample size. Under the null hypothesis, together with boundedness and moment conditions on the predictors, we show that the proposed test statistic is asymptotically standard normal, which is further supported by Monte Carlo experiments. A similar test can be extended to generalized linear models. The practical usefulness of the test is illustrated via an empirical example on paid search advertising. Copyright The Institute of Statistical Mathematics, Tokyo 2014

[1]  Lene Theil Skovgaard,et al.  Applied regression analysis. 3rd edn. N. R. Draper and H. Smith, Wiley, New York, 1998. No. of pages: xvii+706. Price: £45. ISBN 0‐471‐17082‐8 , 2000 .

[2]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[3]  M. Srivastava Some Tests Concerning the Covariance Matrix in High Dimensional Data , 2005 .

[4]  A. Ullah,et al.  Expectation of quadratic forms in normal and nonnormal variables with applications , 2010 .

[5]  J. Bendat,et al.  Measurement and Analysis of Random Data , 1968 .

[6]  D. Dey,et al.  A First Course in Linear Model Theory , 2001 .

[7]  Song-xi Chen,et al.  Tests for High-Dimensional Regression Coefficients With Factorial Designs , 2011 .

[8]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[9]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[10]  D. E. Johnson,et al.  Analysis of Messy Data Volume I: Designed Experiments , 1985 .

[11]  Song-xi Chen,et al.  Tests for High-Dimensional Covariance Matrices , 2010, Random Matrices: Theory and Applications.

[12]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[13]  Z. Bai,et al.  EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM , 1999 .

[14]  Ali S. Hadi,et al.  Regression Analysis by Example: Chatterjee/Regression , 2006 .

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[17]  Brian S. Yandell,et al.  Practical Data Analysis for Designed Experiments , 1998 .

[18]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[19]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[20]  Dallas E. Johnson,et al.  Analysis of messy data , 1992 .

[21]  Richard Goldstein,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival and Repeated Measures Models , 2006, Technometrics.

[22]  George A. F. Seber,et al.  Linear regression analysis , 1977 .

[23]  P. Lachenbruch Mathematical Statistics, 2nd Edition , 1972 .

[24]  P. Hall,et al.  Martingale Limit Theory and its Application. , 1984 .

[25]  Jianqing Fan,et al.  High dimensional covariance matrix estimation using a factor model , 2007, math/0701124.

[26]  S. Weisberg Applied Linear Regression: Weisberg/Applied Linear Regression 3e , 2005 .

[27]  Dallas E. Johnson,et al.  Analysis of Messy Data Volume 1: Designed Experiments, Second Edition , 2004 .

[28]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[31]  Charles E. McCulloch,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models , 2005 .

[32]  Alan J. Lee,et al.  Linear Regression Analysis: Seber/Linear , 2003 .