TEST OF SIGNIFICANCE FOR HIGH-DIMENSIONAL LONGITUDINAL DATA.

This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (2002)'s procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.

[1]  Jianqing Fan,et al.  I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. , 2015, Annals of statistics.

[2]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[3]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[4]  J WainwrightMartin,et al.  Regularized M-estimators with nonconvexity , 2015 .

[5]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[6]  Tuo Zhao,et al.  Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory , 2014, ArXiv.

[7]  Ethan X. Fang,et al.  Testing and confidence intervals for high dimensional proportional hazards models , 2014, 1412.5158.

[8]  A. Qu,et al.  Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates , 2014, 1405.6030.

[9]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[10]  Adel Javanmard,et al.  Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models , 2013 .

[11]  Runze Li,et al.  CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION. , 2013, Annals of statistics.

[12]  R. Tibshirani,et al.  Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[13]  Weidong Liu Gaussian graphical model estimation with false discovery rate control , 2013, 1306.0976.

[14]  Li Wang,et al.  Simultaneous variable selection and estimation in semiparametric modeling of longitudinal/clustered data , 2013, 1302.0151.

[15]  S. Geer,et al.  Quasi-Likelihood and/or Robust Estimation in High Dimensions , 2012, 1206.6721.

[16]  Annie Qu,et al.  Penalized Generalized Estimating Equations for High‐Dimensional Longitudinal Data Analysis , 2012, Biometrics.

[17]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[18]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[19]  Lan Wang,et al.  GEE analysis of clustered binary data with diverging number of covariates , 2011, 1103.1795.

[20]  A. Qu,et al.  Consistent Model Selection for Marginal Generalized Additive Model for Correlated Data , 2010 .

[21]  Annie Qu,et al.  Consistent model selection and data‐driven smooth tests for longitudinal data in the estimating equations approach , 2009 .

[22]  C. Jaquish,et al.  The Framingham Heart Study, on its way to becoming the gold standard for Cardiovascular Genetic Epidemiology? , 2007, BMC Medical Genetics.

[23]  V. Bentkus On the dependence of the Berry–Esseen bound on dimension , 2003 .

[24]  Hongyu Zhao,et al.  Interacting genetic loci on chromosomes 20 and 10 influence extreme human obesity. , 2003, American journal of human genetics.

[25]  John D. Storey A direct approach to false discovery rates , 2002 .

[26]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[27]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[28]  B. Lindsay,et al.  Improving generalised estimating equations using quadratic inference functions , 2000 .

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[31]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .