Estimating and Identifying Unspecified Correlation Structure for Longitudinal Data

Identifying correlation structure is important to achieving estimation efficiency in analyzing longitudinal data, and is also crucial for drawing valid statistical inference for large-size clustered data. In this article, we propose a nonparametric method to estimate the correlation structure, which is applicable for discrete longitudinal data. We use eigenvector-based basis matrices to approximate the inverse of the empirical correlation matrix and determine the number of basis matrices via model selection. A penalized objective function based on the difference between the empirical and model approximation of the correlation matrices is adopted to select an informative structure for the correlation matrix. The eigenvector representation of the correlation estimation is capable of reducing the risk of model misspecification, and also provides useful information on the specific within-cluster correlation pattern of the data. We show that the proposed method possesses the oracle property and selects the true correlation structure consistently. The proposed method is illustrated through simulations and two data examples on air pollution and sonar signal studies .

[1]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[2]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[3]  Xiaotong Shen,et al.  Structural Pursuit Over Multiple Undirected Graphs , 2014, Journal of the American Statistical Association.

[4]  Jianhua Z. Huang Covariance selection and estimation via penalised normal likelihood , 2005 .

[5]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[6]  Ji Zhu,et al.  Sparse Ising Models with Covariates , 2012, ArXiv.

[7]  Sadanori Konishi,et al.  Asymptotic expansions for the distributions of statistics based on the sample correlation matrix in principal component analysis , 1979 .

[8]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .

[9]  Wenjiang J. Fu,et al.  Penalized Estimating Equations , 2003, Biometrics.

[10]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[12]  Jianhua Z. Huang,et al.  Estimation of Large Covariance Matrices of Longitudinal Data With Basis Function Approximations , 2007 .

[13]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[14]  Adam J. Rothman Positive definite estimators of large covariance matrices , 2012 .

[15]  Peter Wonka,et al.  Fused Multiple Graphical Lasso , 2012, SIAM J. Optim..

[16]  Ji Zhu,et al.  Sparse Regulatory Networks. , 2010, The annals of applied statistics.

[17]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[18]  Peng Wang,et al.  Conditional Inference Functions for Mixed-Effects Models With Unspecified Random-Effects Distribution , 2012 .

[19]  A. U.S.,et al.  Sparse Estimation of a Covariance Matrix , 2010 .

[20]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[21]  A. Qu,et al.  Informative Estimation and Selection of Correlation Structure for Longitudinal Data , 2012 .

[22]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[23]  Han Liu,et al.  TIGER: A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models , 2012, 1209.2437.

[24]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[25]  Peng Wang Mixed effects modeling and correlation structure selection for high dimensional correlated data , 2011 .

[26]  Wei Pan,et al.  Maximum Likelihood Estimation Over Directed Acyclic Gaussian Graphs , 2012, Stat. Anal. Data Min..

[27]  R. W. Wedderburn Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method , 1974 .

[28]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[29]  B. Lindsay,et al.  Improving generalised estimating equations using quadratic inference functions , 2000 .

[30]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[31]  Naisyin Wang Marginal nonparametric kernel regression accounting for within‐subject correlation , 2003 .

[32]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[33]  Bruce G. Lindsay,et al.  Building adaptive estimating equations when inverse of covariance estimation is difficult , 2003 .

[34]  Adam J. Rothman,et al.  Sparse estimation of large covariance matrices via a nested Lasso penalty , 2008, 0803.3872.

[35]  Jianqing Fan,et al.  High dimensional covariance matrix estimation using a factor model , 2007, math/0701124.

[36]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .