Variable selection in multivariate linear models with high-dimensional covariance matrix estimation

In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approach are investigated both from a theoretical and a numerical point of view. More precisely, we give general conditions that the estimators of the covariance matrix and its inverse have to satisfy in order to recover the positions of the null and non null entries of the coefficient matrix when the size of the covariance matrix is not fixed and can tend to infinity. We prove that these conditions are satisfied in the particular case of some Toeplitz matrices. Our approach is implemented in the R package MultiVarSel available from the Comprehensive R Archive Network (CRAN) and is very attractive since it benefits from a low computational load. We also assess the performance of our methodology using synthetic data and compare it with alternative approaches. Our numerical experiments show that including the estimation of the covariance matrix in the Lasso criterion dramatically improves the variable selection performance in many cases.

[1]  Adam J Rothman,et al.  Sparse Multivariate Regression With Covariance Estimation , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[2]  Pierre Alquier,et al.  Sparsity considerations for dependent variables , 2011, 1102.1615.

[3]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[4]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[7]  Tristan Mary-Huard,et al.  Structured regularization for conditional Gaussian graphical models , 2014, Statistics and Computing.

[8]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[9]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[10]  J. Varah A lower bound for the smallest singular value of a matrix , 1975 .

[11]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[12]  L. Sansonnet,et al.  A multivariate variable selection approach for analyzing LC-MS metabolomics data , 2017, 1704.00076.

[13]  Yufeng Liu,et al.  Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood , 2012, J. Multivar. Anal..

[14]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[15]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[16]  Mohsen Pourahmadi,et al.  High-Dimensional Covariance Estimation , 2013 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[19]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.