Sparse estimation of large covariance matrices via a nested Lasso penalty

The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering. Using the Cholesky decomposition of the inverse, we impose a banded structure on the Cholesky factor, and select the bandwidth adaptively for each row of the Cholesky factor, using a novel penalty we call nested Lasso. This structure has more flexibility than regular banding, but, unlike regular Lasso applied to the entries of the Cholesky factor, results in a sparse estimator for the inverse of the covariance matrix. An iterative algorithm for solving the optimization problem is developed. The estimator is compared to a number of other covariance estimators and is shown to do best, both in simulations and on a real data example. Simulations show that the margin by which the estimator outperforms its competitors tends to increase with dimension.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[3]  L. R. Haff Empirical Bayes Estimation of the Multivariate Normal Covariance Matrix , 1980 .

[4]  D. Dey,et al.  Estimation of a covariance matrix under Stein's loss , 1985 .

[5]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[6]  J. Friedman Regularized Discriminant Analysis , 1989 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  A W Partin,et al.  The role of PSA and percent free PSA for staging and prognosis prediction in clinically localized prostate cancer. , 1998, Seminars in urologic oncology.

[9]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[10]  P J Diggle,et al.  Nonparametric estimation of covariance structure in longitudinal data. , 1998, Biometrics.

[11]  C C Schulman,et al.  PSA, PSA density, PSA density of transition zone, free/total PSA ratio, and PSA velocity for early detection of prostate cancer in men with serum PSA 2.5 to 4.0 ng/mL. , 1999, Urology.

[12]  M. Pourahmadi Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[13]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[14]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  A. Lu,et al.  Preoperative serum prostate specific antigen levels between 2 and 22 ng./ml. correlate poorly with post-radical prostatectomy cancer morphology: prostate specific antigen cure rates appear constant between 2 and 9 ng./ml. , 2002, The Journal of urology.

[17]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[18]  R. Kohn,et al.  Parsimonious Covariance Matrix Estimation for Longitudinal Data , 2002 .

[19]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[20]  M. Pourahmadi,et al.  Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[21]  R. Kohn,et al.  Efficient estimation of covariance selection models , 2003 .

[22]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[23]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[24]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[25]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[26]  Alexandre d'Aspremont,et al.  Sparse Covariance Selection via Robust Maximum Likelihood Estimation , 2005, ArXiv.

[27]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[28]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[29]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[30]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[31]  Jianqing Fan,et al.  High dimensional covariance matrix estimation using a factor model , 2007, math/0701124.

[32]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[33]  T. Bengtsson,et al.  Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants , 2007 .

[34]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[35]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.